MPEG-7 Compatible Video Feature Extraction and Annotation Tool

The MPEG-7 representations of videos are obtained using the MPEG-7 compatible video feature extraction and annotation tool , shown below. Currently, the tool is operated manually to obtain the MPEG-7 representations according to this MPEG-7 profile. Videos, along with shot boundary information, are loaded and then processed on a shot-by-shot basis. Users can manually select Keyframes, Still Regions and Moving Regions and then annotate the Video, Shots, Keyframes, Still Regions and Moving Regions with free text, keyword and structured annotations. The MPEG-7 visual descriptors (color, texture, shape, motion, localization) for the selected video segments are computed by the tool, using an MPEG-7 feature extraction library adapted from MPEG-7 XM Reference Software. The user can select the set of visual descriptors to describe each type of video segment (e.g., any subset of CSD, SCD, DCD, CLD, EHD, HTD to describe the keyframes). The semantic content is described by text annotations (free text, keyword and structured annotation), which strike a good balance between simplicity (in terms of manual annotation effort and processing during querying) and expressiveness. The output is saved as an MPEG-7 compatible XML file to be stored in the XML database. The tool is still being improved to handle audio, video and image data, and will become a full-fledged MPEG-7 compatible multimedia feature extraction and annotation tool with as much automatic processing capabilities as possible so that manual processing time, human subjectivity and error-proneness can be reduced.

Figure 1: MPEG-7 compatible video feature extraction and annotation tool according to this MPEG-7 profile.
In the graphical user interface, the current video frame is shown at the top left, latest processed frame is at the bottom left,
latest selected region is at the top right, and selected Moving Regions along with their trajectories are at the bottom right.
Selected video segments (Shots, Keyframes, Still Regions, Moving Regions) are shown on the right in a hierarchical tree view
reflecting the structure of the video.

