Bilkent University
Department of Computer Engineering
MS Thesis Presentation


Diverse Sequence Search and Alignment


Elif Eser
MS Student
Computer Engineering Department
Bilkent University

Sequence similarity tools, such as BLAST, seek sequences from a database most similar to a query. They return results significantly similar to the query sequence that are typically also highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach where the initial results guide the user to new searches. However, diversity has not been considered as an integral component of sequence search tools yet. Repetitions in the result can be avoided by introducing non-redundancy during database construction; however, it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produces non-redundant results optimized for any given query. We define diversity measures for sequences, and propose methods to obtain diverse results extracted from current sequence similarity search tools. We propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a similarity query. We evaluate the effectiveness of the proposed methods in post-processing PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Our experiments show that the proposed methods are able to achieve more diverse yet similar result sets compared to static non-redundancy approaches. In both sequence based and functional diversity evaluation, the proposed diversification methods outperform original BLAST results significantly. We built an online diverse sequence search tool Div-BLAST that supports queries using BLAST web services. It re-ranks the results diversely according to given parameters.


DATE: 23 August, 2013, Friday @ 10:00