CS 681: Advanced Topics in Computational Biology

Fall 2020

NOTE:  Some biology, molecular biology, genetics background would help, but not required. Basics regarding the topic will be covered in class.

  • Course material - Fall 2020

  • Note: Some slides are adapted from https://www.bioinformaticsalgorithms.org/

  • Slides Recommended reading
    01 - Introduction EBI's Introduction to Biology
    JGI's Introduction to Genomics
    02 - Sequencing technologies Lightbody et al.
    Senol Cali et al.
    03 - Compression Computational solutions to large-scale data management and analysis.
    Numanagic et al.
    04 - Read mapping Short Read Mapping: An Algorithmic Tour
    Tools for mapping high-throughput sequencing data.
    Minimap2: pairwise alignment for nucleotide sequences
    05 - SNP and Indel discovery
    06 - Phasing
    07 - Segmental duplications and SV (Part 1)
    08 - SV (Part 2)
    09 - SV (Part 3)

  • The following are deprecated as of 2019. Kept for archival purposes only.

  • Lecture slides
    Recommended reading
    A brief introduction to genomes and concepts in genomics. Genome structure. Genomic variation.
    Week 1, Lecture 1
    Week 1, Lectures 2-3
    EBI's Introduction to Biology
    JGI's Introduction to Genomics
    Rare and common variants: twenty arguments, Greg Gibson, Nature Rev Genet
    Repetitive DNA and next-generation sequencing: computational challenges and solutions, Treangen and Salzberg, Nature Rev Genet.
    Genomic variation discovery, microarrays and other genotyping platforms.
    Week 2, Lecture 1
    Week 2, Lectures 2-3
    Microarray overview
    NCBI's microarray primer
    Rabiner's tutorial on HMMs
    HMMseg  SCIMM  SCIMMkit  SCOUT  BirdSuite ÇOKGEN
    Introduction to high throughput sequencing (HTS). Different sequencing platforms, pyrosequencing, sequencing by synthesis, sequencing by ligation, single molecule sequencing. Upcoming platforms based on nanotechnology. Advantages and disadvantages. Computational challanges in analyzing HTS data.
    Week 3, Lecture 1
    Weed 3, Lectures 2-3
    Sequencing technologies - the next generation.
    Computational solutions to large-scale data management and analysis.
    If you are interested in the SCALCE paper, send me an email for a preprint since it is not published yet.
    Read mapping. Burrows-Wheeler Transform and Ferragina-Manzini index. Hash
    based and BWT-FM based aligners.
    Week 4, Lectures 1-2-3
    Comparison of read mappers
    MAQ & mapping qualities
    Bowtie and BWT-FM
    mrsFAST and cache oblivious mapping
    ZOOM! and spaced seeds
    SNP and small indel discovery using HTS. Haplotype resolution.
    Week 5, Lecture 1
    Week 5, Lectures 1-2
    GATK, Samtools, PyroBayes,
    SNP calling review by Nielsen et al.
    Fosmid based haplotype phasing
    Phasing using sequencing and fragment conflict graps
    Structural variation, copy number variation, copy number polymorphism and segmental duplications.
    Week 6, Lecture 1
    Week 6, Lectures 2-3
    Review on SV discovery & genotyping methods
    1000 Genomes SV Companion
    First algorithms:
    Read depth   Read Pair   Split Read
    NGS versions
    NGS WSSD CNVnator  EWT
    VariationHunter VariationHunter2
    BreakDancer Pindel NovelSeq
    CNVer GenomeSTRiP BreakSeq
    Sequence assembly algorithms. De Bruijn graphs for genome assembly. String graphs. Techniques and shortcomings of assembly.
    Week 7, Lecture 1
    Week 7, Lectures 2-3
    Assembly of large genomes
    Assembly algorithms for NGS
    Lander-Waterman statistics
    Celera Assembler    Arachne
    de Bruijn Graphs Primer
    Assembler Comparison
    Bloom filter
    Error correction:  Quake  SHREC  ECHO
    De Bruijn graphs for genome assembly.
    Week 8, Lecture 1
    Transcriptome analysis. Transcriptome assembly, alternative splicing and fusion gene discovery.
    Week 8, Lectures 2-3
    Cortex  Ray

    RNAseq review 1
    RNAseq review 2
    Transcriptome assembly review
    TopHat  Cufflinks  RPKM  Trans-ABySS
    Scaffolding with RNAseq
    deFuse Comrad
    Epigenetics. Methylation, histone modification. Finding ``active'' genes using DNAseI hypersensitivity assays.
    Week 9, Lecture 1
    Histone modification. RNA secondary structure prediction
    Week 9, Lectures 2-3
    DNA methylation review
    BATMAN  BSMAP Bismark
    ChIP-seq review
    Segway   PeakSeq

    Mfold  ViennaRNA  RNAalifold
    Zuker's RNA page
    Hofacker's RNA page
    RNA folding
    Week 10, Lecture 1
    RNA-RNA interactions, protein sequencing
    Week 10, Lectures 2-3
    Densityfold  CONTRAfold  SCFG for folding
    PairFold  inteRNA