Bilkent University
Department of Computer Engineering


Mining Noisy Web Data for Concept Learning


Eren Gölge
MSc Student
Computer Engineering Department
Bilkent University

We attack the problem of learning concepts automatically from noisy Web image search results. The idea is based on discovering common characteris- tics shared among subsets of images by posing a method that is able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Concept Map (CMAP). Given an image collec- tion returned for a concept query, CMAP provides clusters pruned from outliers. Each cluster is used to train a model representing a different characteristics of the concept. The proposed method outperforms the state-of-the-art studies on the task of learning from noisy web data for low-level attributes, as well as high level object categories. It is also competitive with the supervised methods in learning scene concepts. Moreover, results on naming faces support the generalisation ca- pability of the CMAP framework to different domains. CMAP is capable to work at large scale with no supervision through exploiting the available sources.


DATE: 07 April, 2014, Monday @ 16:10