Hybrid layered video encoding and caching for resource constrained environments Siddhartha Chattopadhyay a, * , Suchendra M. Bhandarkar b a Google Inc., Mountain View, CA 94043, USA b Department of Computer Science, The University of Georgia, Athens, GA 30602-7404, USA article info Article history: Received 5 December 2007 Accepted 10 September 2008 Available online 23 September 2008 Keywords: Layered video encoding Layered video caching Generative video Generative sketch-based video Layered video Power adaptive video Content based video encoding abstract Video playback on a mobile device is a resource-intensive task. Since the battery life of a mobile device decreases with time, it is desirable to have a video representation which adapts dynamically to the avail- able battery life during the playback process. A novel Hybrid Layered Video (HLV) encoding scheme is pro- posed, which comprises of content-aware, multi-layer encoding of texture and a generative sketch-based representation of the object outlines. Different combinations of the texture- and sketch-based represen- tations are shown to result in distinct video states, each with a characteristic power consumption proﬁle. Further, a smart content-aware caching scheme is proposed which is suitable for low-latency dissemina- tion of HLV over the Internet. The proposed HLV representation, combined with the proposed caching scheme, is shown to be effective for video playback and dissemination on power-constrained mobile devices. Ó 2008 Elsevier Inc. All rights reserved. 1. Introduction Video playback on a mobile device such as a PDA, pocket-PC, multimedia-enabled mobile phone (such as an iPhone), or a laptop PC operating in battery mode, is a resource-intensive task in terms of CPU cycles and battery power [1]. Video playback typically re- sults in rapid depletion of battery power in the mobile device, regardless of whether the video is streamed from a hard drive on the device, or from a remote server. Several techniques have been proposed to reduce power consumption during video playback on the mobile device [2,3,24–26]. These techniques use various hard- ware and software optimizations to reduce power consumption during video playback. Typically, power savings are realized by compromising the quality of the rendered video. This tradeoff is not always desirable, since the user may choose to watch the video at its highest quality if sufﬁcient battery power is available on the device. Thus, it is desirable to formulate and implement a multi-layer encoding of the video such that distinct layers of the video display different power consumption characteristics. The lowest layer should con- sume the least power during video decoding and rendering. The power consumption during video decoding and rendering should increase as more layers are added to the video. Typically, the less the battery power available to decode and render the video, the lower the quality of the rendered video [4]. Thus, it is necessary to enhance quality of the lower video layers in order to ensure that the quality of the rendered video is acceptable. Traditional layered video encoding, as used by MPEG-4 Fine Grained Scalability proﬁle (MPEG-FGS), is customized for varying bitrates, rather than power-adaptive usage [37]. The various video layers are obtained by performing certain operations on low-level (i.e., pixel-level) data, for example, progressive truncation of the DCT coefﬁcients or progressive smoothing of the pixel values [37]. Although lowering the video bit rate has the effect of lowering the power consumption [36,38], the semantic content of the video may not be adequately preserved. In this paper, we present the design and implementation of a novel Hybrid Layered Video (HLV) encoding scheme. The proposed representation is termed as ‘‘hybrid” due to the fact that its constit- uent layers are a combination of standard MPEG-based video encoding and a generative sketch-based video representation. The input video stream is divided into two components: a sketch component and a texture component. The sketch component is a Generative Sketch-based Video (GSV) representation, where the outlines of the objects of the video are represented as curves [18]. The evolution of these curves (termed as pixel-threads), across the video frames is explicitly mod- eled in order to reduce temporal redundancy. The texture compo- nent in the proposed HLV encoding scheme is represented by three layers; a base layer video, an intermediate mid-layer video, and the original video. The base layer represents a very low bitrate video with very low visual quality whereas the highest layer in the HLV representation denotes the original video. The base layer video can be augmented by the object outlines (that are emphasized with dark contours) using the Generative Sketch-based Video (GSV) rep- 1047-3203/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2008.09.003 * Corresponding author. E-mail addresses: siddhartha2k5@gmail.com (S. Chattopadhyay), suchi@cs. uga.edu (S.M. Bhandarkar). J. Vis. Commun. Image R. 19 (2008) 573–588 Contents lists available at ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci