CS 481: Bioinformatics Algorithms

...  more like:

(bio)informatics ALGORITHMS

Fall 2021

NOTE:  Some biology, molecular biology, genetics background would help, but not required. Basics regarding the topic will be covered in class.

Prerequisites: Elementary discrete mathematics, basic algorithms and data structures, and programming proficiency with, e.g., C/C++/Java will be expected. Knowledge of elementary combinatorics and probability, fundamental algorithms for sorting, searching, hashing and dynamic programming, elementary graph algorithms would be very helpful.

Textbook: none required.

Grading: There will be one midterm exam (25%) and a final exam (35%). A further 30% of the final grade will be based on homework/programming assignments, and 10% will be based on quizzes.

  • Planned Lecture Cancellations

  • The following lectures are canceled.

  • <none yet>

  • Course material (updated as of Fall 2021 - subject to further updates)

  • Note: Some slides are adapted from http://bix.ucsd.edu/bioalgorithms/slides.php
  • All pseudocodes in the slides are provided as guidelines, not meant for direct "copy-paste" implementation.

  • 0- Introduction
  • 1- Motif Finding & Algorithm Design Techniques
  • 1b- Short note on homework implementations
  • 2- Exact String Matching - Knuth-Morris-Pratt and Boyer-Moore
  • 3- Exact String Matching - Rabin-Karp, Finite Automata. Shift/And
  • 4- Pattern Matching - Hashing, keyword trees, and suffix trees, suffix arrays, Aho-Corasick
  • 5- Burrows-Wheeler Transformation and Ferragina-Manzini index
  • 6- Dynamic programming, Manhattan grid, edit distance, longest common subsequence
  • 7- Global alignment, affine gap penalties, local alignment, Needleman-Wunsch and Smith-Waterman
  • Needleman-Wunsch with affine gaps sample
  • Smith-Waterman samples
  • Static
  • Random sequence generator
  • 8- Banded alignment, linear space alignment, block edit distance, Four Russians method
  • 9- Multiple sequence alignment
  • 10- Guide trees, evolutionary trees - Neighbor-Joining and UPGMA, parsimony, Sankoff and Fitch algorithms
  • 11- Clustering - hierarchical, k-means, CAST
  • 12- Genome rearrangements - sorting by reversals, pancake flipping
  • 13- Sequence similarity search - BLAST
  • 14- PatternHunter
  • 15- RNA folding
  • 16- K-mer data structures: Bloom filter, BSTs, SBTs, CQFs, de Bruijn graphs

  • 17- Introduction to DNA sequencing
  • 18- High throughput sequencing, compression, read mapping
  • 19- Introduction to genome assembly

  • Archived slides -- do not use. Kept for archival purposes only.

  • Lecture slides
    Recommended reading
    A brief introduction to molecular biology concepts and
    algorithmic problems in computational biology.
    Week 1, Lectures 1-2
        (topic removed from the course as of Fall 2016 - recommended reading is still recommended)

    A brief introduction to computational complexity and
    algorithm design techniques
    Week 1, Lecture 3
    EBI's Introduction to Biology
    JGI's Introduction to Genomics
    Chapter 3 of the textbook

    Chapter 2 of the text book

    DNA mapping & motif search
    Week 2, Lectures 1-2

    Exact sequence search algorithms
    Week 2, Lecture 3
    Chapter 4 of the text book
    Rabin-Karp algorithm, finite automata, pattern matching, suffix trees
    Week 3, Lectures 1-2

    Shift-and algorithm, Aho-Corasick algorithm, suffix arrays
    Week 3, Lecture 3
    Chapter 9 of the text book
    Elements of dynamic programming, Manhattan tourist problem, introduction
    to sequence alignment
    Week 4, Lectures 1-2

    Global alignment, local alignment, affine gap penalties
    Week 4, Lecture 3
    Chapter 6 of the text book
    Approximate string matching, divide and conquer algorithms, Four-Russians trick
    Week 5, Lectures 1-2
    Week 5, Lecture 3: Quiz 2
    Chapters 7 and 9 of the text book
    Multiple sequence alignment
    Week 6, Lectures 1-2
    Week 6, Lecture 3 canceled due to holidays
    Chapter 6 of the text book
    Gene finding  (topic removed from the course as of Fall 2013)
    Week 7, Lectures 1-2

    Heuristic sequence search
    Week 7, Lecture 3
    Chapter 6 of the text book
    Spaced seeds for heuristic spaced seeds
    Week 8, Lectures 1-2

    Chapters 9 and 11 of the text book
     Hidden Markov Models, Viterbi training, Baum-Welch algorithm
    Week 9, Lecture 3
    Chapter 11 of the text book
    Phylogenetic tree construction
    Week 10, Lectures 1-2
    Week 10, Lecture 3
    Chapter 10 of the text book
    Clustering algorithms
    Week 11, Lectures 1-2

    CAST algorithm, introduction to genome rearrangements
    Week 11, Lecture 3
    Chapter 10 of the text book
    Genome rearrangements, sorting by reversal
    Week 12, Lectures 1-2
    Chapter 5 of the text book
    RNA folding
    Week 13, Lectures 1-2
    Week 13, Lecture 3 is canceled.