Enhanced Shot-Based Video Adaptation using MPEG-21 generic Bitstream Syntax Schema Sarah De Bruyne, Davy De Schrijver, Wesley De Neve, Davy Van Deursen, Rik Van de Walle Department of Electronics and Information Systems - Multimedia Lab - Ghent University - IBBT Gaston Crommenlaan 8 bus 201, B-9050 Ledeberg-Ghent, Belgium Email: {sarah.debruyne, davy.deschrijver, wesley.deneve, davy.vandeursen, rik.vandewalle}@ugent.be Abstract— Semantic video adaptation takes into account the relevance of the different fragments of the video content in order to create a tailored video stream based on the user’s preferences. As a shot can be considered as the smallest semantic unit in a video sequence, metadata can be added to each shot using MPEG-7 descriptions. Based on these metadata and the user’s preferences, the original bitstream can be adapted in order to obtain the desired fragments. MPEG-21 DIA offers a tool, gBS Schema, for exposing the high-level structure of a binary resource as an XML description. In this paper, shot information is inserted in these descriptions to create a link between metadata and semantic video adaptation. Furthermore, this paper proposes to keep the structure of these descriptions format-agnostic. As a result, only one generic transformation style sheet has to be implemented to support shot-based video adaptation of sequences compliant with different video specifications. Special attention is payed to sequences coded with the H.264/AVC standard as this specification contains several interesting features important for shot-based video adaptation. I. I NTRODUCTION As multimedia has proliferated over the past years, many new technologies have been developed to establish the delivery and consumption of multimedia content. Users began to expect that this content can easily be accessed according to their own preferences. Therefore, the delivered content must be tailored to the user’s characteristics and preferences, as well as to the capacities of the terminals and networks. Video adaptation [1] is an emerging field of interest that in- cludes techniques responding to the above challenges. Several adaptation strategies can be identified, either operating on a semantic level (e.g., removal of violent scenes or extraction of semantic highlights), at a structural level (e.g., key frame extraction), or at signal-processing level (e.g., transcoding). To adapt a video sequence, MPEG-21 Digital Item Adapta- tion (DIA) [2] offers a tool, generic Bitstream Syntax Schema (gBS Schema), to describe the high-level structure of the bitstream using the Extensible Markup Language (XML). The resulting XML document is called a generic Bitstream Syntax Description (gBSD) which makes it possible to describe the bitstream in a coding format-agnostic manner. This paper concentrates on the link between metadata and format-agnostic semantic video adaptation by making use of gBS Schema. This way, metadata and semantic video adaptation can be coupled in an elegant manner. Therefore, shot information is inserted in the gBSDs indicating to which shot each frame belongs. The selection of the desired shots can be obtained by using MPEG-7 descriptions containing metadata about the different shots. Once the desired shots are indicated, a generic transformation style sheet is used to obtain the desired adapted sequence by linking the desired shots to the shot information available in the gBSD. Special attention needs to be payed to the extraction of the desired fragments as the adapted bitstream needs to remain compliant with the corresponding specification. Related work includes a semantic adaptation framework for the generation of semantic metadata and the semantic adaptation of video on a frame basis using gBS Schema [3]. Furhtermore, [4] and [5] focus on video adaptation using gBS Schema. In particular, an example of a gBSD is given which is used to classify fragments of a video using semantic information. This paper is organized as follows. The following section introduces the main enabling technologies and concepts, while Sect. III discusses the shot-based adaptation process. Experi- mental results are given in Sect. IV. II. ENABLING TECHNOLOGIES AND CONCEPTS A. gBSD-driven Content Adaptation MPEG-21 gBS Schema is a tool of part 7 (Digital Item Adaptation, DIA) of the MPEG-21 specification used to facilitate content adaptation [4], [5]. To realize this, gBS Schema defines a framework that enables the description of the high-level structure of a bitstream in XML, resulting in a Bitstream Syntax Description (BSDs). This description is not meant to describe the bitstream on a bit-per-bit basis, but rather addresses its high-level structure. In Fig. 1, a global architecture for a BSD-based content adaptation framework is given. First, a BSD of the high-level structure of the bitstream is generated. This BSD is then adapted according to the user’s preferences by means of a transformation language. Finally, the adapted BSD becomes input to an adaptation module responsible for the generation the corresponding bitstream. gBS Schema uses only one generic schema to describe the structure of a generic BSD (gBSD), making the syntax of the gBSD generic and codec-independent. Therefore, the regener- ation of the adapted bitstream can be achieved without the need of codec-specific schemas. Furthermore, this schema makes it possible to describe the bitstream in a hierarchical fashion and provides semantically meaningful marking of syntactical 380 Proceedings of the 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing (CIISP 2007) 1-4244-0707-9/07/$25.00 ©2007 IEEE