Bilkent University
Department of Computer Engineering


Active Learning Approach for Multi-Class Data Classification on Pap-Smear Cell Images


Ahsen İkbal Yergök
MSc. Student
Computer Engineering Department
Bilkent University

Classification is a well-known problem of identifying the class of the new observed data from a given model that is learned before. In order to obtain a model, a training set is prepared by using the data that can be labeled with their known classes. Generally, the accuracy of the classifier is related to the size of the training set since modeling is getting better with a large amount of data. However, finding large datasets cannot be possible for some areas or labeling the data can be costly. For this reason, active learning approach is proposed which is based on achieving greater accuracy with fewer training labels.

In this approach pap-smear cell images built by the Herlev University Hospital are classified by using this approach and SVMs for classification process. The dataset consist of 7 classes of cervical cancer degree classified by the experts. According to the ground truth, the proposed method starts with a few numbers of labeled training data and learns a model from them. After that, the most unknown data is asked for a label and the system iterates. The results and the number of labeled data are encountered with bare SVM classification. Consequently, active learning approach gives the same accuracy by using the %56 of the labeled data.


DATE: 25 April, 2011, Monday @ 16:50