Bilkent University
Department of Computer Engineering


Massively Parallel Mapping Of Next Generation Sequence Reads Using GPUs


Azita Nouri
MS Student
Computer Engineering Department
Bilkent University

DNA alignment is a DNA comparison between the some or similar species. The DNA sample of an organism is presented in small fragments that are extracted by Whole Genome Shotgun Sequencing (WGS) . Short reads from WGS process are mapped to a reference DNA. In hash-based approach, reference DNA is divided into segments and stored in a hash table. The two main problems for this approaches is, dealing with large data and having imperfect data. Mapping the data generated from the DNA of a single sample takes tens of CPU days, so algorithm should tolerate errors and be fast. Hash-based seed and extend method consists of two parts, the first part is comprised of creating and managing hash table and the second part is aligning, which is done by string matching algorithm that utilizes dynamic programming. One idea is using Needleman-Wunsch algorithm for string comparison. Although a single alignment is not time-consuming, there are billions of comparisons executed for mapping data from single sample. Since the most time consuming part of the algorithm is alignment part, it is very crucial to gain speedup in. we implemented a GPGPU algorithm, that can align millions of dynamic matrices concurrently by considering the memory and thread properties of GPUs which ensures maximum occupancy on GPU.


DATE: 16 November, 2015, Monday @ 16:15