Department of Computer Engineering
CS 590 SEMINAR
Identifying Cancer Patient Subgroups via Partially Supervised Subspace Clustering
Computer Engineering Department
Each cancer type is a heterogonous disease consisting of subtypes on the molecular, histopathological, and clinical level. Identifying patient subgroups of cancer is critically important as the unique molecular characteristics of a particular patient subgroup reveal distinct disease states and opens up possibilities for targeted therapeutic regimens. Traditionally, clustering analysis is applied on the genomic data of the tumor samples and the patient clusters are found to be of interest if they can be associated with a clinical outcome variable such as the survival rate of patients. In lieu of this unsupervised framework, we propose a semi-supervised clustering framework, in which the clustering partitions are guided with the clinical outcome of interest. In this approach a random forest is trained to classify the patients based the clinical variable. The partitions of the patients on the ensemble of trees are used to construct a patient similarity matrix, which is then used as input to a clustering algorithm. The method is related to subspace clustering approaches, as the method searches for clusters in subset of the original features. We demonstrated our results on a benchmark handwritten digit dataset. Application of the method on breast cancer mRNA expression data to find patient subgroups that is associated with survival rate of the patients is promising. Further work will investigate the utility of other high-dimensional genomic information including microRNA expression and somatic mutations.
DATE: 13 April, 2015, Monday @ 15:40