Bilkent University
Department of Computer Engineering


Annotating Proteins by Mining Protein Interaction Networks


Gultekin Ozsoyoglu
Department of Electrical Eng. And Computer Science
Case Western Reserve University
Cleveland, Ohio, USA

Annotating genes/gene products with Gene Ontology (GO) terms to characterize the traits of genomic entities is an important activity to biologists. However, manual annotation, the most reliable form of gene annotation by GO terms, requires significant amounts of human effort and is very costly. In this talk, we consider the problem of assigning Gene Ontology annotations to newly discovered proteins. We present a data mining technique that computes the probabilistic relationships between GO annotations of proteins on protein-protein interaction data, and assigns highly correlated GO terms of annotated proteins to non-annotated proteins in the target set. More specifically, our approach is to compute the probabilistic significance of GO annotation sequences obtained from the annotations of a sequence of proteins in a protein-protein interaction network. We develop and evaluate two significance analysis techniques: (a) correlation mining for annotation pairs (i.e., GO annotation sequences of length 2), (b) variable-length Markov model for annotation sequences of arbitrary length, via probabilistic suffix trees. Our cross-validation prediction experiments with pre-annotated proteins recovered correct annotations of proteins with 81% precision and 45% recall. In comparison with several previous protein function prediction techniques, probabilistic suffix tree and correlation mining techniques produced the highest prediction accuracy.

Bio:Gultekin Ozsoyoglu is a professor of the Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, USA. He received his BS degree in electrical engineering, and the MS degree in computer science from the Middle East Technical University, Ankara, Turkey, in 1972 and 1974, respectively, and the PhD degree in computing science from the University of Alberta, Edmonton, Alberta, Canada, in 1980. Prof. Ozsoyoglu's current research interests include data management and database-related issues in bioinformatics, web data mining, and literature digital libraries. He has published in major database and computer science conferences and journals such as ACM Transactions on Database Systems, IEEE Transactions on Software Engineering, IEEE Transactions on Knowledge and Data Engineering, and Journal of Computer and System Sciences. He has served in program committees and panels of a large number of conferences including ACM SIGMOD, VLDB, and IEEE Data Engineering. He was a general and program chair of a number of conferences, and has served on NSF, NIH, NRC, and Ford Foundation panels.


DATE: November 24, 2006, Friday@ 13:40