BilVideo-7: Sample Queries

In this section, we present some example queries performed on a video data set consisting of 14 video sequences with 25.000 frames from TRECVID 2004 and 2008 data sets, including news, documentary, educational and archiving program videos. We obtained the MPEG-7 representations of the videos with our MPEG-7 feature extraction and annotation tool. The query result is a list of shots in ranked order, each shown with a representative keyframe in the following figures.

1. Spatial Query Examples
Query 1 : Golfer above golf cart (Figure 1.1, top; specified by keywords)
Query 2 : Clinton left Blair (Figure 1.1, bottom; specified by sketch)
Two spatial query examples are shown in Figure 1.1. The first query at the top searches for the video segments in which a golfer is above a golf cart. The query is formulated as "golfer above golf cart" in the Spatial Query Interface. The system successfully returns three relevant video segments that exactly match the query condition. The forth result contains a golfer but no golf cart and spatial condition is not satisfied. Therefore, its rank is lower than the first three. The second query is formulated by drawing two rectangles on the sketch-based Spatial Query Interface and labeling them as Clinton and Blair. The first two video segments satisfy the query condition exactly, while in the last two, the spatial condition is not satisfied but Clinton and Blair appear together. Due to our bottom-up fusion algorithm, as the number of satisfied query conditions for a video segment decreases its similarity also decreases, ranking lower in the query result. As a result, the ranking approach is effective and it produces query results that are close to our perception.

2. Low Level Query Examples
Query 1 : Low Level Query by image (Figure 2.1)
Query 2 : Low Level Query by image region (Figure 2.2)
Query 2 : Low Level Query by video (Figure 2.3)
Figures 2.1, 2.2 and 2.3 show three low-level query examples. The first one is an image-based query, in which the query image is represented with Color Structure and Dominant Color descriptors. The region in the region-based query is represented with Color Structure and Region Shape descriptors. The video in the video-based query is represented with GoF/GoP descriptor. Both query results are satisfactory considering the types of descriptors used.

3. Composite Query Examples
The query shown in Figure 3.1 is a composite query, in which high-level semantics and low-level descriptors are used to describe the query inputs. Moreover, there are two different types of video segments in the query: a Keyframe and Moving Region. In Figure 3.2, he query is composed of Still Regions and Moving Regions which are represented with low-level descriptors or high-level semantic concepts. As the query results show, our system can handle such queries effectively. The number and type of video segments in the query, as well as the descriptors used to describe them are not limited. This makes the composite queries very flexible and powerful, enabling the user to formulate very complex queries easily. To our knowledge, our system is unique in supporting such complex queries.