Bilkent University
Department of Computer Engineering


Discovery of Copy Number Variation Using Exome Sequencing Data


Fatma Balcı
MSc Student
Computer Engineering Department
Bilkent University

Over the last few years, next-generation sequencing (NGS) evolved into a popular technology for comprehensive characterization of copy number variation (CNVs) by generating hundreds of millions to billions of short reads in a single run. NGS-based analysis has been widely applied to identify CNVs in both healthy and individuals with genetic disease. Correspondingly, the high demand for NGS-based CNV analyses has fueled the development of numerous computational methods and tools for CNV detection.

Currently, whole genome sequencing (WGS) and whole exome sequencing (WES) are the two major strategies that use NGS for DNA analysis. Several CNV calling tools have been developed that use either WGS or WES data. WES is morecost effective (~6-fold cheaper than WGS), since it aims to sequence only the protein coding regions (exons) of the genome.

Because other sequence signatures (read pair, split read, assembly) rely on being able to capture the CNV breakpoints at basepair resolution, read depth (RD) based methods have a better chance of detecting CNVs that encompass exons even if the breakpoints are not within the targeted regions. The underlying hypothesis of RD-based methods is that the depth of coverage in a genomic region correlates with the copy number of the region. However; due to differences in capture efficiencies of different exons, the RD-based methods that assume Poisson distribution in read depth are not applicable to WES data.

We develop a new method based on RD analysis to discover CNVs using WES. There are other popular methods to find CNVs in exome sequencing data in the literature, but none of them could comprehensively detect all types of CNVs, and assign absolute copy numbers. To improve CNV detection, we we classify CNV types using a semi-supervised learning approach.


DATE: 28 April, 2014, Monday @ 16:40