Enabling on-the-fly Video Shot Detection on YouTube Thomas Steiner Google Germany GmbH ABC-Str. 19 20354 Hamburg, Germany tomac@google.com Ruben Verborgh Ghent University – IBBT, ELIS Multimedia Lab 9050 Ghent, Belgium ruben.verborgh@ugent.be Joaquim Gabarró Vallés Universitat Politècnica de Catalunya 08034 Barcelona, Spain gabarro@lsi.upc.edu Michael Hausenblas DERI, NUI Galway IDA Business Park Lower Dangan Galway, Ireland michael.hausenblas@deri.org Raphaël Troncy EURECOM 2229 route des crêtes, BP 193 Sophia Antipolis, France raphael.troncy@eurecom.fr Rik Van de Walle Ghent University – IBBT, ELIS Multimedia Lab 9050 Ghent, Belgium rik.vandewalle@ugent.be ABSTRACT Video shot detection is the processor-intensive task of split- ting a video into continuous shots, with hard or soft cuts as the boundaries. In this paper, we present a client-side on- the-fly approach to this challenge based on modern HTML5- enabled Web APIs. We show how video shot detection can be seamlessly embedded into video platforms like YouTube using browser extensions. Once a video has been split into shots, shot-based video navigation gets enabled and more fine-grained playing statistics can be created. Categories and Subject Descriptors I.2.10 [Vision and Scene Understanding]: Video anal- ysis; H.5.1 [Multimedia Information Systems]: Video (e.g., tape, disk, DVI) General Terms Algorithms Keywords Shot detection, shot boundary detection, video processing 1. INTRODUCTION Official press statistics [12] from YouTube, one of the biggest online video platforms, state that more than 13 mil- lion hours of video were uploaded during 2010, and that 48 hours of video are uploaded every single minute. Given this huge amount of video content, it becomes evident that ad- vanced search techniques are necessary in order to retrieve the few needles from the giant haystack. Closed captions al- low for keyword-based in-video search, a feature announced in 2008 [4]. Searching YouTube for a phrase like “that’s Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright is held by the author/owner(s). WWW2012 Developer Track, April 18–20, 2012, Lyon, France.. a tremendous gift”, a caption from Randy Pausch’s famous last lecture Achieving Your Childhood Dreams 1 , reveals the video of his lecture. If no closed captions are available, nor can be automatically generated, keyword-based search is still available over tags, video descriptions, and titles. Presented with a potentially long list of results, preview thumbnails based on video still frames help users decide on the most promising result. YouTube uses an unpublished computer vision-based algorithm for the generation of smart thumb- nails on YouTube and lets video owners choose one out of three automatically suggested thumbnails. In this paper, we introduce on-the-fly shot detection for YouTube videos as a third means besides keyword-based search and thumbnail preview for deciding on a video from the haystack. As a user starts watching a video, we detect shots in the video by visually analyzing its content. We do this with the help of a browser extension, i.e., the whole process runs dynamically on the client-side, using modern HTML5 JavaScript APIs of the <video> and <canvas> el- ements [8]. As soon as the shots have been detected, we offer the user the choice to quickly jump into a specific shot by clicking on a representative still frame. Figure 1 shows the seamless integration of the detected shots into the YouTube website enabled by the browser extension. The main con- tributions of this paper are the browser extension itself and improved video navigability by shot navigation. A screen- cast 2 and demo 3 of our approach are available. 2. RELATED WORK Video fragments consist of shots, which are sequences of consecutive frames from a single viewpoint, represent- ing a continuous action in time and space. The topic of shot boundary detection has already been described exten- sively in literature. While some specific issues still remain (notably gradual transitions and false positives due to large movement or illumination changes), the problem is consid- ered resolved for many cases [6, 13]. The contribution of our approach is that it is entirely Web-based and on-the-fly, which introduces interesting new challenges that traditional approaches do not have to cope with. Highest shot detection 1 Last Lecture: http://bit.ly/pausch-last-lecture 2 Screencast: http://bit.ly/filmstrip 3 Demo: http://bit.ly/filmstrip-debug