Bilkent University
Department of Computer Engineering

Probabilistic Approaches to Image Classification and Content-based Retrieval

Selim Aksoy

Insightful Corporation
Seattle, WA


Content-based image retrieval (CBIR) has become a very popular research area in the recent years as it provides new application areas and new challenges to image processing, machine learning, computer vision and pattern recognition. In this talk, I will give an overview of two probabilistic approaches to CBIR. In the system developed at the University of Washington, we pose the retrieval problem in a classification framework where the goal is to minimize the classification error in a setting of two classes: the relevance and irrelevance classes of the query. We propose effective solutions to different levels of the retrieval process within this framework. Feature extraction and normalization is done by maximizing class separability. Similarity is measured using likelihood of two images being similar or dissimilar. A key aspect of our framework is a two-level modeling of probability. The first level maps high-dimensional feature spaces to two-dimensional probability spaces using parametric density models for features. The second level uses combinations of classifiers trained in multiple probability spaces and corresponds to a modeling of ``probability of probability'' to compensate for errors in modeling probabilities in feature spaces. The Bayesian formulation also provides a unified framework for fusion of information from different features as well as support for relevance feedback.

In the second part of the talk, I will describe our work on interactive classification and retrieval of remote sensing images at Insightful Corporation. We developed a hierarchical representation that models image content in pixel, region and scene levels to reduce the gap between low-level features and high-level user semantics. Pixel level representations are learned through automatic fusion of spectral, textural and ancillary features using decision tree, rule-based and naive Bayes classifiers. Region level representations consist of shape characteristics and statistics of groups of pixels. Scene level representations include a visual grammar that models spatial interactions of region groups using attributed relational graph structures. The visual grammar can be used to summarize image content and to classify images into high-level categories using Bayesian classifiers that automatically learn groups of regions that can distinguish particular classes of scenes from others. Quantitative results and example queries will be presented for both systems.

DATE: September 2, 2003, Tuesday @ 13:40