... more like:

_{(bio)}informatics **ALGORITHMS**

- Instructor: Can Alkan
- TA: Ricardo Román Brenes (ricardo@bilkent)

- Class hours:
- Tue 13:30-15:20 - Fri 09:30-19:20

- Class room: EB104
- Office hour: by appointment. Zoom or Google Meet only.
- Public calendar (changes frequently): http://cs.bilkent.edu.tr/~calkan/calendar.html

NOTE: Some biology, molecular biology, genetics background would help, but not required. Basics regarding the topic will be covered in class.

- Recommended Material
- Genome-Scale Algorithm Design, Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, Alexandru I. Tomescu, 2015, Cambridge University Press
- An Introduction to Bioinformatics Algorithms (Computational Molecular Biology), Neil Jones and Pavel Pevzner, MIT Press, 2004
- Bioinformatics algorithms : an active learning approach, Phillip Compeau and Pavel Pevzner, Active Learning Publishers; 2018
- Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison, Cambridge University Press
- Bioinformatics: The Machine Learning Approach, Second Edition, Pierre Baldi, Soren Brunak, MIT Press
- Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Dan Gusfield, Cambridge University Press

**Grading:** There will be one midterm exam (25%) and a
final exam (35%). A further 30% of the final grade will be based on
homework/programming assignments, and 10% will be based on quizzes.

The following lectures are canceled due to conference travels.

- September 23, 2022
- September 27, 2022

- 0- Introduction
- 0b- Short note on homework implementations
- 1- Motif Finding & Algorithm Design Techniques
- 2- Exact String Matching - Knuth-Morris-Pratt and Boyer-Moore
- 3-
Exact String Matching - Rabin-Karp, Finite Automata. Shift/And

- 4-
Pattern Matching - Hashing, keyword trees, and suffix trees, suffix
arrays, Aho-Corasick

- 5- Burrows-Wheeler Transformation and Ferragina-Manzini index
- 6- Dynamic programming, Manhattan grid, edit distance, longest common subsequence
- 7- Global alignment, affine gap penalties, local alignment, Needleman-Wunsch and Smith-Waterman
- Needleman-Wunsch with affine gaps sample
- Smith-Waterman samples
- Static
- Random
sequence generator

- 8- Banded alignment, linear space alignment, block edit distance, Four Russians method
- 9- Multiple sequence alignment
- 10- Phylogenetic tree construction: Guide trees, evolutionary trees - Neighbor-Joining and UPGMA, parsimony, Sankoff and Fitch algorithms
- 11 - Heuristic sequence search. Short introduction to BLAST. Hash table indexes, minimizers. Chaining
- 12- MEMs, MUMs, BWA-MEM, minimap2, K-mer data structures
- 13- Alignment-free k-mer composition analysis. Minimum perfect hashing, MinHash, Jaccard Index
- 14- Graphs in genome analysis. OLC, de Bruijn, string graphs. Aligning reads to graphs
- 15- Applications: short introduction to genome sequencing. Current platforms and data types. Standard file formats
- 16- Applications: programming libraries, application-specific programming languages

Lecture
slides |
Recommended
reading |

A
brief introduction to molecular biology concepts and algorithmic problems in computational biology. Week 1, Lectures 1-2 (topic removed from the course as of Fall 2016 - recommended
reading is still recommended)A brief introduction to computational complexity and algorithm design techniques Week 1, Lecture 3 |
EBI's
Introduction to Biology JGI's Introduction to Genomics Chapter 3 of the textbook Chapter 2 of the text book |

DNA mapping & motif search Week 2, Lectures 1-2 Exact sequence search algorithms Week 2, Lecture 3 |
Chapter 4 of the text book |

Rabin-Karp algorithm, finite
automata, pattern matching, suffix trees Week 3, Lectures 1-2 Shift-and algorithm, Aho-Corasick algorithm, suffix arrays Week 3, Lecture 3 |
Chapter 9 of the text book |

Elements of dynamic programming,
Manhattan tourist problem, introduction to sequence alignment Week 4, Lectures 1-2 Global alignment, local alignment, affine gap penalties Week 4, Lecture 3 |
Chapter 6 of the text book |

Approximate string matching, divide
and conquer algorithms, Four-Russians trick Week 5, Lectures 1-2 Week 5, Lecture 3: Quiz 2 |
Chapters 7 and 9 of the text book |

Multiple sequence alignment Week 6, Lectures 1-2 Week 6, Lecture 3 canceled due to holidays |
Chapter 6 of the text book |

Gene
finding (topic removed from the course as of Fall
2013)Week 7, Lectures 1-2 Heuristic sequence search Week 7, Lecture 3 |
Chapter 6 of the text book |

Spaced seeds for heuristic spaced
seeds Week 8, Lectures 1-2 |
Chapters 9 and 11 of the text book |

Hidden
Markov Models, Viterbi training, Baum-Welch algorithm Week 9, Lecture 3 |
Chapter 11 of the text book |

Phylogenetic tree construction Week 10, Lectures 1-2 Week 10, Lecture 3 |
Chapter 10 of the text book |

Clustering algorithms Week 11, Lectures 1-2 CAST algorithm, introduction to genome rearrangements Week 11, Lecture 3 |
Chapter 10 of the text book |

Genome rearrangements, sorting by
reversal Week 12, Lectures 1-2 |
Chapter 5 of the text book |

RNA folding Week 13, Lectures 1-2 Week 13, Lecture 3 is canceled. |