Bilkent University
Department of Computer Engineering


Naming Faces On the Web


Hilal Zitouni
MSc. Student
Computer Engineering Department
Bilkent University

In this study, we introduce a method to name less-frequently appearing people on the web via naming frequently appearing ones first. Current image search engines are widely used for querying a person, however; retrievals are based on textual context; therefore, the results are not satisfactory. Face recognition, on the other hand, is a long standing problem; however, it is tested for limited sizes and successful results are acquired just for face images captured under controlled environments. Faces on the web, contrarily are huge in amount and vary in pose, illumination, occlusion and facial attributes. Recent researches on the area, suggest not to use simply the visual or textual content alone, but to combine them both. With this approach, face recognition problem is simplified to a face-name association problem.

Following these approaches, in our method textual and visual information is combined to name faces. We divide the problem into two sub problems, first the frequently appearing faces, then the less-frequently appearing faces on the web images are named. A supervised algorithm is used for naming a specified number of categories belonging to frequently appearing faces. The faces that are not matched with any category are then considered to be the less-frequently appearing faces and labeled using the textual context. We extracted all the names from textual contexts, and then eliminate the ones used to label more frequently-appearing faces before. The remaining names are the candidate categories for less-frequently appearing faces. Each detected less-frequently appearing face finally matched to the names extracted from their corresponding textual context. In order to prune the irrelevant face images, finally, the most similar faces among this collection are found to be matched with their corresponding category.

In our experiments, the method is applied on two different datasets. Both datasets are constructed from the images captured in realistic environments, varying in pose, illumination, facial expressions, occlusions and etc. The results of the experiments proved that the combination of textual and visual contents on realistic face images outperforms the methods that use either one of them. Besides, handling the face recognition problem as a face-name association, improves the results for the face images collected from uncontrolled environments.


DATE: 30 July, 2010, Friday @ 10:30