Bilkent University
Department of Computer Engineering


Algorithms for structural variation discovery using multiple sequence signatures


Arda Söylev
PhD Student
Computer Engineering Department
Bilkent University

In this project, we develop the TARDIS (Toolkit for Automated and Rapid DIscovery of Structural variations) algorithm, to identify genomic structural variation (SV) in human genome using high throughput sequencing (HTS) technologies. Although the HTS technologies revolutionize the field of genomics, there are still several unsolved problems in HTS data analysis. The main challenges with HTS data are: 1) it generates unprecedented amounts of data, which is very difficult to handle; 2) the reads are very short, which introduces ambiguity in read mapping since the reads may align to different locations with similar edit distances, 3) the DNA is fragmented to very short pieces, which limits the ability of spanning across repeats and duplications. Structural variation is briefly defined as genomic variants that alter DNA sequences longer than 50 basepairs. They can be in the form of deletions, insertions, inversions, transposons, duplications, and translocations. There are basically four different sequence signatures that can be used to detect SVs: read pair, read depth, split read, and assembly. All these sequence signatures have different detection power depending on the type, size, and underlying sequence properties of the variants. With the TARDIS algorithm we are planning to incorporate all of these signatures into a single framework to increase the accuracy and sensitivity of SV discovery. We will later add support for single nucleotide variation (SNV) and short insertion/deletion (indel) discovery to our algorithm. The last step in our research will be integrating newer sequencing technologies such as Pacific Biosciences and Oxford Nanopore, and alternative sequencing library production techniques such as CPT-Seq, 10X Genomics, and Dovetail.


DATE: 22 February, 2016, Monday @ 16:10