Bilkent University
Department of Computer Engineering


Alternative Methods for Predicting Cancer Biomarkers from Microarray Gene Expression Data


Mohammed Alshalalfa
Department of Computer Science
University of Calgary
Calgary, Alberta, Canada

Bioinformatics is a field of science mainly integrating computer science, mathematics, statistics and biology where the aim is to discover knowledge hidden within biological data. One of the widely investigated biological data is gene expression microarray data. Profiling the global gene expression patterns in different tissues/sample can be investigated in few days due to microarray technology, which can accommodate the whole genome, unlike traditional methods which may take months. However, analyzing microarray data is challenging as the number of features (genes) is very large relative to the number of attributes (samples). Using microarray in cancer diagnosis showed to be very efficient and reliable, but the large number of genes makes the data noisy and difficult to deal with. Consequently, identifying relevant genes has received considerable attention.

In this talk, we will present three methods capable of extracting cancer biomarkers from gene expression data. The first method is based on controlled multi-level clustering (we mostly do two-levels of clustering); we filter the data initially with a statistical test and then cluster the data iteratively to get the best number of clusters. The genes closest to the centroids of the resulting clusters showed to have high potential to be significant features for sample classification. These genes (one per centroid) are used as input for building a classification model. The second method is based on iterative t-test in a way that eliminates noise from the data. The third method is a hybrid approach which combines statistical tests with entropy based tests. This method uses the t-test and Singular Value Decomposition (SVD) based entropy. It showed to be effective as it considers the feature itself and its effect on the data entropy. This method is the first to combine entropy and statistical significance for gene ranking. The test results reported on benchmark data sets demonstrate the applicability and effectiveness of the three proposed models for identifying cancer biomarkers.

Bio: Mohammed Alshalalfa received his B.Sc. in Molecular Biology and Genetics in July 2006 from Middle East Technical University (METU), Ankara, Turkey. During his undergraduate studies, he worked in the Plant Biotechnology lab at METU. In the last year of his undergraduate studies, Mohammed visited the Department for Molecular Biomedical Research (DMBR), Ghent University, Belgium, for two months to study the effect of Influenza viral infection on NF-kB transcription factor activity. In September 2006, Mohammed joined the graduate program in the Department of Computer Science at University of Calgary, Calgary, Canada, where he completed his M.Sc. degree in July 2008. Currently, Mohammed is a Ph.D. candidate in the Department of Computer Science at the University of Calgary under the supervision of Professor Reda Alhajj. So far Mohammed has published more than 20 papers in refereed international conferences and journals. Mohammed is focusing on Microarray data analysis and applications of data mining technique into Microarray data. He received several prestigious scholarships including iCore graduate scholarship. He serves on the program committee of several conferences. His research interests are in the areas of genomics, proteomics, computational biology, social networks and bioinformatics, where is successfully adapting different machine learning and data mining techniques for modeling and analysis.


DATE: 28 May, 2009, Thursday@ 13:40