Bilkent University
Department of Computer Engineering


Large structural variation discovery using hybrid sequence data


Ezgi Ebren
MS Student
Computer Engineering Department
Bilkent University

Genomic structural variations (SVs) are briefly defined as large-scale alterations of DNA content, copy, and organization. Although significant progress has been made since the introduction of high throughput sequencing (HTS) in characterizing SVs, accurate detection of complex SVs and balanced rearrangements still remains elusive due to the sequence complexity at the breakpoints. This is because of the difficulty of read mapping in such regions when the reads are short. Although mapping problem can be ameliorated using long read platforms, the higher sequencing error rates keep the problem challenging. Sequencing costs of long read platforms are also higher, which prohibit their routine use in large scale projects. To leverage the complementary nature of short and long sequencing, we developed a novel algorithm, LaVa, that uses hybrid sequence data to discover SVs. LaVa focuses mainly on large SVs (10 Kbp to 10 Mbp) and uses low coverage (5-10X) PacBio long reads and high coverage (>30X) Illumina short reads to detect deletions and inversions. We tested LaVa using both real and simulated data sets, and we show that LaVa achieves high sensitivity and low false discovery rate.


DATE: 05 November, 2018, Monday, CS590 presentations begin at @ 15:40