Department of Computer Engineering
MS THESIS PRESENTATION
Survival Prediction via Partial Ordering in Feature Space and Sample Space
(Supervisor: Asst. Prof. Dr. Öznur Taştan)
Computer Engineering Department
Predicting the survival of a cancer patient is critical for choosing patient specific treatment strategies and is traditionally based on clinical or pathological factors such as patient age and tumor stage. In this thesis, we present two methodologies to build effective and interpretable survival models that utilize high-dimensional molecular profiles made available through next-gen sequencing technologies.
Firstly, we present a method that focuses on partial ordering in the feature space. Existing models rely on the individual molecular quantities recorded in tumors. However, cancer is a complex disease where molecular mechanisms are dysregulated in various ways. This study, based on a system level perspective, incorporates the partial ordering of molecules (POF) in lieu of individual quantities. This strategy not only unveils predictive features with direct relevance to the biological mechanism and but also yields better performance in survival prediction compared to multivariate L1 penalized Cox proportional hazard and Random Survival Forest models. Testing the partial order representation of features in the subgroup identification task, we find that these features yield groups of patients, which are more quantifiably distinct in terms of survival distributions.
Secondly, we develop a survival prediction method based on ranking and support vector machines -- Ranking Survival Vector Machines (RsurVM). RsurVM obtains a pairwise ranking of the patient survival times by learning to rank via a support vector machine approach. It focuses on optimizing the most commonly used metric concordance index and can handle the censored data without making any assumptions. Our extensive tests on the ovarian adenocarcinoma patient molecular data demonstrate that RsurVM achieves better survival predictions regardless of the input molecular data (mRNA, protein, miRNA, Copy number variation and DNA methylation) than the two most commonly used methods: Cox-proportional hazards model and Random Survival Forest.
DATE: 24 March, 2016, Thursday @ 13:30