Bilkent University
Department of Computer Engineering


Comparing Short Tandem Repeat Detection Algorithms


Gülfem Demir
MS Student
Computer Engineering Department
Bilkent University

Adjacent repeated sequences in eukaryotic genomes are defined as tandem repeats. Depending on the length and composition of each repeat unit, they further are classified as microsatellites (2-5 bp), minisatellites (10-60 bp), alpha- (171 bp), beta- (68 bp), and gamma-satellites (220 bp). Microsatellites are also referred to as short tandem repeats (STRs). The human genome contains nearly 260,000 repeats which covers approximately 7% of the entire genome. Expansions and contractions of STRs are associated with genetic diseases such as Huntington's and fragile X syndrome, and they play a role in certain forms of cancer, which make their characterization a research interest. Thanks to recent advances in high-throughput sequencing (HTS) technologies, it is not surprising to find a rich variety of bioinformatics tools developed for STR characterization. However, whole human genome and targeted sequencing have only started to offer the real potential to facilitate clinical decisions. Hence, assessing the accuracy of variant calls and understanding biases and sources of error in sequencing and STR detection/characterization tools has become more crucial than ever. In this study, we describe a high-confidence set of genome-wide genotype calls that can be used as a benchmark for comparing short tandem repeat detection algorithms. The availability of a ground truth for tandem repeats allows us to compare the performance of several popular STR detection algorithms at a whole genome scale on real data.


DATE: 07 March, 2016, Monday @ 16:10