Bioinformatics Resources

Bioinformatics approaches are playing an increasingly important role in stem cell and regenerative biology research. While many online resources are available to help researchers learn large-scale data analysis techniques, the sheer volume of options can be overwhelming.

To help cut down on the noise, HSCRB researchers have recommended the following specific resources that they have found useful. Additionally, the Harvard groups linked in the sidebar offer training sessions and materials.

Beginner resources

General

R

Python

Intermediate/advanced resources

General

Command line tools

  • Samtools: sam/bam file manipulation and formatting (e.g. merging bam files, removing PCR duplicates, sorting files by coordinate/read name)
  • Bedtools: bed file manipulation and formatting (e.g. intersecting two sets of coordinates, counting reads overlapping a set of annotations)
  • Picard tools: suite of tools useful for summarizing sequence files (e.g. read library duplication rate estimation, insert size metrics for paired-end reads)
  • UMItools: tools for dealing with unique molecular identifiers
  • MACS: peak calling with ChIP/DNAse/ATAC-seq data
  • IGV: viewing genomic data tracks (i.e. read pile-ups by experiment/sample)
  • FASTQC: quality control for raw sequencing data (e.g. RNA-seq, ATAC-seq)

R

Python

  • Single-cell data analysis
  • Traditional machine learning
  • Deep learning
    • Keras: suitable for quickly constructing common types of neural networks
    • PyTorch: more flexible, has many low level operations available
Search Menu