BilVideo-7: Visual Query Interface

Users formulate their queries on BilVideo-7 client graphical user interface. These queries are converted into BilVideo-7Query format in XML and sent to the BilVideo-7 Query Processing Server over TCP/IP. The query results are displayed to the user as a list of video segment intervals in ranked order, from where the segments can be selected and viewed.

The Visual Query Interface of BilVideo-7 clients provides an intuitive, easy-to-use query formulation interface and consists of several tabs, each for a different type of query with a comprehensive set of descriptors and query options. At the top of the graphical user interface, the user can specify the general query options: how many results and what kind of video segment should be returned. As shown in the screenshots below, the query formulation tabs are on the left, the query result list is displayed at the top right, the query results can be viewed on the media player at the bottom right, and messages are displayed at the bottom left. The user can select the media type, return type and maximum number of results to be returned from the toolbar at the top. The return type of a query can be one of the following: Video, Supershot, Shot, Subshot. If Video is selected as the return type, whole videos matching the query are returned; if Shot is selected, the query result list consists of Shots. Subshots are video segments contained in the Shots, such as Keysegments and Moving Regions, and Supershots are consecutive Shots satisfying the query.

The queries in BilVideo-7 are video segment based. The queries are specified by putting together a number of video segments and describing them with a set of descriptors or semantic concepts. These video segments are the video segments that exist in the MPEG-7 representation of a video: Shots, Keysegments/Keyframes, Still Regions, and Moving Regions. The retrieved segments can be of one of the following types: subshot segment, shot, video segment, and video.

In the following, we present screenshots from Visual Query Interface of BilVideo-7 for different types of queries. We are continuously improving the user interface to add more functionality and user control. This page is updated as the user interface changes. Please let us know (bastan[at]cs*bilkent*edu*tr) if you have any comments to make the user interface better.

1. Video Table of Contents (VideoToC) is a useful facility to let the user browse through the video collection in the database, to see the contents of each video in a hierarchical tree view reflecting the structure of the MPEG-7 representation of the video in XML format and to see the high-level semantic concepts in the collection and in each video separately, as shown below (Figure 1.1, 1.2). The user can browse through each video in the collection and see all the Shots, Keyframes, Still Regions and Moving Regions as well as the semantic concepts they are annotated with and their temporal location (Media Time) in the video.

2. Textual Query Interface enables the user to formulate high-level semantic queries quickly by entering keywords and specifying what type of video segment (Shot, Keyframe, Still Region, Moving Region) and what kind of annotation (Free Text, Keyword, Structured) to search in.

3. Color, Texture, Shape Query Interface is used for querying video segments by MPEG-7 color, texture and shape descriptors. The input media can be a video segment, a whole image or an image region. The descriptors need to be extracted from the selected input media. Instead of uploading the input media to the server and extracting the descriptors there, we have chosen to extract the descriptors on the client, form the XML-based query expression containing the descriptors and send the query to the server. Therefore, the MPEG-7 feature extraction module is integrated with BilVideo-7 clients. The user also specifies which type of video segments to search in, and also other query options such as weights and thresholds for each type of descriptor.

4. Motion Query Interface is for the formulation of Motion Activity (MAc) and Motion Trajectory (MTr) queries of Moving Regions. Trajectory points are entered with mouse. The user can optionally specify keywords for the Moving Region for which the trajectory query will be performed.

5. Spatial Query Interface enables the user to formulate spatial queries for Still and Moving Regions using either keywords and a set of predefined spatial relations (left, right, above, below, east, west, etc.) or by sketching the minimum bounding rectangles (MBR) of objects with the mouse, and if desired, giving labels to them. Since spatial queries are valid for Still and Moving Regions, region types (Still/Moving Region) should also be selected along with other query options. It is possible to query objects based on location, spatial relations or both.

6. Temporal Query Interface is very similar to spatial query interface; this time, the user specifies temporal relations between video segments (Shots, Keyframes, Still Regions, Moving Regions) either by selecting from a predefined list (before, after, during, etc.) or by sketching the temporal positions of the segments with the mouse.

7. Composite Query Interface is for composing a query using any combination of textual, color, texture, shape, motion, spatial and temporal queries with any number and type of video segments. This is the most powerful query interface and it enables the user to formulate very complex queries easily. The query is composed by putting together Shots, Keyframes, Still Regions and Moving Regions and specifying their properties as text-based semantic annotations, visual descriptors, location, spatial and temporal relations. Using this interface, the user can describe a video segment or a scene and ask the system to retrieve similar video segments.

8. XQuery Interface is more suited to the experienced users who can write XQueries to search in the database. This is the most flexible interface and user can specify a wide range of queries.

BilVideo-7 Visual Query Interface