Bilkent University
Department of Computer Engineering


An Object Recognition Framework Using Contextual Interactions Among Objects


Fırat Kalaycılar
MSc. Student
Computer Engineering Department
Bilkent University

Object recognition is one of the fundamental tasks in computer vision. Main endeavor in object recognition research is to devise techniques that make computers understand what they see as precise as human beings. The state of the art recognition methods utilize low-level image features (color, texture, etc.), interest points/regions, filter responses, etc. to find and identify objects in the scene. Although these work well for specific object classes, the results are not satisfactory enough to accept these techniques as universal solutions. Thus, the current trend is to make use of context embedded in the scene. Context defines the rules for object - object and object - scene interactions. A scene configuration generated by some object recognizers can sometimes be inconsistent with scene context. For example, observing a car in a kitchen is not likely in terms of the kitchen context. In this case, knowledge of kitchen can be used to correct this inconsistent recognition. Motivated by the benefits of contextual information, we introduce an object recognition framework that utilizes contextual interactions between individually detected objects to improve the overall recognition performance. Our main contributions are twofold. The first contribution is a probabilistic contextual interaction model for objects based on their spatial relationships. In order to represent the spatial relationships between objects, we propose three features that encode the relative position/location, scale and orientation of a given object pair. Using these features and our object interaction likelihood model, we achieve to encode the semantic, spatial, and pose context of a scene concurrently. Our second main contribution is a contextual agreement maximization framework that assigns final labels to the detected objects by maximizing a scene probability function that is defined jointly using both the individual object labels and their pairwise contextual interactions. The most consistent scene configuration is obtained by solving the maximization problem using linear optimization. Experiments on the LabelMe and Bilkent data sets showed that incorporation of the contextual interactions improves the overall recognition performance.


DATE: 29 July, 2009, Wednesday @ 10:00