Bioinformatics approaches are playing an increasingly important role in stem cell and regenerative biology research. While many online resources are available to help researchers learn large-scale data analysis techniques, the sheer volume of options can be overwhelming.
To help cut down on the noise, HSCRB researchers have recommended the following specific resources that they have found useful. Additionally, the Harvard groups linked in the sidebar offer training sessions and materials.
Beginner resources
General
- Harvard course: Introduction to computer science and programming
- FAS Research Computing: Cluster Quick Start Guide
- GitHub, tool for versioning and saving shareable code online: Hello World tutorial
R
- Lessons with interactive worksheets: Programming with R, R for Reproducible Scientific Analysis
- EdX courses: Data Science: R Basics, Introduction to R for Data Science
- Textbooks: The Art of R Programming, R for Data Science
Python
- Amber Biology course: Python For Life Scientists
Intermediate/advanced resources
General
- EdX course: High-Dimensional Data Analysis
Command line tools
- Samtools: sam/bam file manipulation and formatting (e.g. merging bam files, removing PCR duplicates, sorting files by coordinate/read name)
- Bedtools: bed file manipulation and formatting (e.g. intersecting two sets of coordinates, counting reads overlapping a set of annotations)
- Picard tools: suite of tools useful for summarizing sequence files (e.g. read library duplication rate estimation, insert size metrics for paired-end reads)
- UMItools: tools for dealing with unique molecular identifiers
- MACS: peak calling with ChIP/DNAse/ATAC-seq data
- IGV: viewing genomic data tracks (i.e. read pile-ups by experiment/sample)
- FASTQC: quality control for raw sequencing data (e.g. RNA-seq, ATAC-seq)
R
- General
- dplyr: package for quick and simple table manipulation
- R packages: book with good tips and rules to follow when building your own R package
- Computational genomics
- Irizari lab resources: Genomics Data Analysis
- Hansen lab resources: Bioconductor for Genomic Data Science
- Plotting and visualization
- Book: ggplot2: Elegant Graphics for Data Analysis
- Video workshop: Plotting anything with ggplot2
- Single-cell data analysis
- Book: Orchestrating Single-Cell Analysis with Bioconductor
- Course: Analysis of single cell RNA-seq data
- Collection of algorithms: Single-cell RNA-seq pseudotime estimation algorithms
Python
- Single-cell data analysis
- Video workshop: Bioinformatics for Benched Biologists
- Traditional machine learning
- scikit-learn: tools for beginner-level traditional machine learning
- scikit-image: package for image analysis
- Deep learning