978-1-4673-1239-4/12/$31.00 c 2012 IEEE Network Aware Caching for Video on Demand Systems Bogdan Carbunar Computing and Information Sciences Florida International University Miami, FL Rahul Potharaju Computer Sciences Department Purdue University West Lafayette, IN Michael Pearce Motorola Solutions Schaumburg, IL Venu Vasudevan Applied Research Center Motorola Inc. Libertyville, IL Abstract—Video on Demand (VoD) services allow users to select and locally consume remotely stored content. We investigate the use of caching to solve the scalability issues of several existing VoD providers. We propose metrics and goals that define the requirements of a caching framework for CDNs of VoD systems. Using data logs collected from Motorola equipment from Comcast VoD deployments we show that several classic caching solutions do not satisfy the proposed goals. We address this issue by developing a novel technique for predicting future values of several metrics of interest. We use these predictions to evaluate the penalty imposed on the system (network and caches) when not caching individual items. We use item penalties to propose novel caching and static placement strategies. We use the mentioned data logs to validate our solutions and show that they satisfy all the defined goals. I. I NTRODUCTION Most cable providers today support Video on Demand (VoD) solutions, enabling subscribers to access items from a central database, transfer them over a Content Distribution Network (CDN) and view them on their Set Top Boxes (STBs). In our work we focus on CDNs of cable providers (e.g., Comcast, Charter and Time Warner) that are built on a CATV transport network. A typical CDN has a hierarchical architecture and consists of a central Video Server Office (VSO) and multiple Video Home Office (VHO) sites, all connected through a high-bandwidth, low-latency fiber ring (see Figure 1). The VSO hosts the content library and handles the supported content life cycle, and the VHO sites serve disjoint subscriber regions. Current VoD solutions require each VHO to store all the content supported by the system. This solution provides high content availability and simplifies the content management process 1 . It presents however significant hardware scalability issues: the size of the content library constantly increases as more and larger items need to be supported 2 . The use of caching at the VHO level seems to provide a natural solution, enabling each VHO site to be managed independently and making hardware scaling dependent on local demand. However, due to VHO level misses (occurring when requested content is not cached), this approach introduces a trade-off 1 Newly supported content is propagated from the VSO to all the VHOs, using an efficient multicast protocol over a ring topology. 2 The content encoding is moving from standard definition to high definition and eventually to BlueRay quality and even 3D content. STB STB STB STB Router - VSO B-1 Server Content Library Router - V2 Router - V3 Router - V4 V1 Cache B-3 Streaming STB STB B-3 Streaming Cache V3 Router - V1 request/serve scheduled multicast Fig. 1. System Architecture. Thick lines denote the ring topology links, connecting the VSO and the VHOs. Links are bi-directional. The VSO has a B-1 streaming server and each VHO has a lower streaming capacity server, B- 3. User requests that cannot be satisfied from local VHO caches are forwarded to the VSO who then sends the content. between the additional miss traffic imposed on the network links and the hardware scaling cost. Thus, the first contribution of this paper consists of identifying several metrics that are fundamental for a VoD CDN architecture, along with goals that need to be satisfied by efficient solutions. We then show that several classic caching strategies do not perform well on the metrics identified. A second contribution of this work consists of the pro- posal of efficient caching and static placement algorithms that predict a penalty value for each item: the network and storage cost of not storing the item during a future interval. Our solutions rely on a novel technique for predicting future values of several metrics of interest, using patterns extracted from several data logs from Motorola equipment from several VoD deployment sites. Our solutions take advantage of the existence of streaming servers at the VSO and at all the VHO sites (see Section II). This allows VHO sites to stream missed requests from peer sites while not forcing them to cache all missed items. We then use item penalties to drive not only the replacement algorithm – which items to evict from a cache – but also the decision of which items to reliably transfer and cache and which to stream and not cache. The caching problem has been studied in a variety of contexts, e.g., Web caching, memory and distributed storage. The seminal work of Dahlin et al. [1] introduced the concept of collaborative caching along with several caching algorithms. Our work differs in that clients can access other caches but cannot decide their membership. Caching for streaming