Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

 

Pangenome Graph Construction Using Locally Consistent Parsing

 

Akmuhammet Ashyralyyev

Master Student
(Supervisor: Assoc.Prof.Can Alkan)
Computer Engineering Department
Bilkent University

Abstract: Locally consistent parsing (LCP) is a string processing technique that systematically partitions strings into consistent, small substrings called cores. These cores represent exact substrings found within the processed input string. LCP follows three fundamental rules in identifying these cores, ensuring a nearly uniform distribution of substring lengths and distances between consecutive cores. The generalized implementation of LCP involves iterative application of the technique, enabling the creation of larger substrings while maintaining the uniform distribution of cores. This project aims to present the generalized implementation of LCP with its well-defined terminology and provide experimental results to substantiate the claim of achieving an almost uniform distribution of core distances and lengths. We compared LCP with minimizers, a widely used processing technique, concerning their distribution patterns and execution times in the human genome. Experimental results also showed that as the LCP level (the recursive call number of LCP) increases, the number of cores and execution time decrease whereas the length of the cores and the distances between them increase by a constant factor. Additionally, experimental results based on PacBio sequencing simulations demonstrate LCP's error tolerance for reads with different accuracy rates. We provide a software package that implements the LCP technique in both C++ and Rust.

 

DATE: March 04, Monday @ 13:30 Place: EA 502