Bilkent University
Department of Computer Engineering


Fisher Kernel based Models for Image Classification and Weakly Supervised Object Localization


Dr. Ramazan Gkberk Cinbis

One of the main topics in computer vision research is image understanding, which refers to a set of inter-related tasks. These tasks include, but are not limited to detection of objects, recognition of scenes and inference of the relationships across objects in images. While we are still far from solving the image understanding problems, significant progress has been made in the past decade. This progress has been primarily driven by two main factors: (i) advances in image representations, (ii) utilization of larger datasets in building data-driven models. In this talk, I will present two Fisher kernel based models towards addressing the limitations of the contemporary image representations, and using weak supervision to enable the use of larger training data.

In the first part of the talk, I will present our work on non-iid representations for image categorization. In traditional image representations such as bag-of-words (BoW) and Fisher vector (FV), local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. In our work, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood with respect to the model hyper-parameters. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our models naturally generate discounting transformations, providing an explanation of why such transformations are successful in practice.

In the second part, I will briefly talk about our recent work on weakly supervised object localization. Standard detector training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning by using object absence/presence labels only. We propose a novel multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vector and convolutional neural network features. In this context, we also introduce a window refinement method, which improves the localization accuracy by incorporating an objectness prior.

Bio: Ramazan Gkberk Cinbis graduated from Bilkent University, Turkey, in 2008, and received an M.A. degree in computer science from Boston University, USA, in 2010. He was a doctoral student in the LEAR team, at INRIA Grenoble, France, from 2010 until 2014, and received a PhD degree in computer science from Universit de Grenoble, France, in 2014. He received the best thesis prize of the French Association for Pattern Recognition (AFRIF). His work has led to publications in the top-tier international computer vision conferences and journals, including CVPR, ICCV, ECCV, and TPAMI. His research interests include computer vision and machine learning.


DATE: 30 November 2015, Monday @ 14:40