Seminar in Computer Engineering

Bilkent University
Department of Computer Engineering
Ph.D Dissertation

Multiple View Human Activity Recognition

Selen Pehlivan
PhD Student
Computer Engineering Department
Bilkent University

This thesis explores the human activity recognition problem when multiple views are available. We follow two main directions: we first present a system based on volume matching using calibrated camera systems, then we present a flexible system based on frame matching. We examine the multiple views compared to single view systems, and measure the performance improvement with increasing views by various experiments.
Initial part of the thesis introduces compact representations for volumetric data gained through volume reconstruction. Here, we assume to have multiple cameras with the available camera matrices in the system. The video frames recorded by many cameras with significant overlap are fused by reconstruction, and volumes are considered as substitutes of action poses. We propose new pose descriptors over three dimensional data that are fast and discriminative in the sense of a key pose. We first present a descriptor as the histogram of oriented cylinders in various sizes and orientations. We then propose another descriptor with view-independence, not requiring pose alignment. We show the importance of discriminative pose representations within simpler action classification schemes. Activity recognition framework based on volume matching presents promising results compared to the state-of-the-art.
Volume reconstruction is one natural approach for multi camera data fusion, but there can be few cameras with overlapping views. In the second part of the thesis, we introduce an architecture that is adaptable to various number of cameras and features with desirable engineering features. The system collects and fuse activity judgments from cameras using a voting scheme. The architecture requires no camera calibration. Performance generally improves when there are more cameras and more features; training and test cameras do not need to overlap; camera drop in or drop out is handled easily with little penalty. Experiments support the performance penalties, and advantages for using multiple views versus single view.
Keywords: Video analysis; Human activity recognition; Multiple views; Multiple cameras; Pose representation.

DATE: 19 July, 2012, Thursday @ 13:00
PLACE: EA409