Bilkent University
Department of Computer Engineering


Diverse Sequence Alignment on BLAST Results


Elif Eser
MSc Student
Computer Engineering Department
Bilkent University

Most sequence analyses tasks in bioinformatics require an exploratory approach. However, sequence similarity tools, such as BLAST, search the most similar sequences to a given query from a database of sequences. They return top results that are also highly similar within each other. While diversity is essential in genomic studies, the current tools do not provide diverse results. Some redundancy is removed in preprocessing time when the databases are generated, but this is time consuming. In this paper, we investigate diverse search and browsing for sequence databases. We introduce a definition of diversity for sequences and propose methods to obtain diverse results extracted from current bio-sequence tools. We experiment on BLAST results as it includes a comprehensive sequence database and gives effective results with their local alignment to the query sequence. We present our tool that post-processes BLAST results to provide diverse results while preserving similarity to the query.


DATE: 19 November, 2012, Monday @ 16:50