THEORETICAL ADVANCES Aneesh Chauhan Æ Sameer Singh Æ Dave Grosvenor Episode detection in videos captured using a head-mounted camera Received: 1 October 2003 / Accepted: 15 April 2004 / Published online: 19 June 2004 Ó Springer-Verlag London Limited 2004 Abstract With the advent of wearable computing, per- sonal imaging, photojournalism and personal video diaries, the need for automated archiving of the videos captured by them has become quite pressing. The prin- cipal device used to capture the human-environment interaction with these devices is a wearable camera (usually a head-mounted camera). The videos obtained from such a camera are raw and unedited versions of the visual interaction of the wearer (the user of the camera) with the surroundings. The focus of our research is to develop post-processing techniques that can automati- cally abstract videos based on episode detection. An episode is deﬁned as a part of the video that was cap- tured when the user was interested in an external event and paid attention to record it. Our research is based on the assumption that head movements have distinguish- able patterns during an episode occurrence and these patterns can be exploited to diﬀerentiate between an episode and a non-episode. Here we present a novel algorithm exploiting the head and body behaviour for detecting the episodes. The algorithm’s performance is measured by comparing the ground truth (user-declared episodes) with the detected episodes. The experiments show the high degree of success we achieved with our proposed method on several hours of head-mounted video captured in varying locations. Keywords Dominant motion Æ Episode detection Æ Head-mounted video Video abstraction Video abstracts are helpful in a number of contexts [1] including the development of multimedia archives, movie marketing, and home entertainment. Tradition- ally, video archives are indexed and searched by text that leads to loss of information. An audio-visual abstract is semantically much richer than text and is deﬁned as a sequence of moving images, extracted from a longer video, much shorter than the original, and preserving the essential message of the original. Manual abstraction techniques are very time consuming and the search is on for developing eﬃcient semi-automatic abstraction methods (a computer produces a draft summary that is further edited by a human expert) and fully automatic abstraction methods (the summary is produced solely by the computer). In all cases, abstracts can be a collection of only key frames (still images) [2–8], a variable number of frames extracted depending on the content as in video skimming [9] or true video content [1, 10–12]. In this paper we are interested in the latter and in fully auto- matic abstraction methods. It is important that video abstraction provides good quality abstracts. Lienhart [13] recommends the fol- lowing basis of judging the quality of abstracts: (a) balanced coverage of material, (b) optimally shortened shots, (c) a more detailed coverage of selected clusters rather than a totally balanced but too short coverage, and (d) a proper choice of editing pattern. It is quite obvious that the quality of the abstract can only be judged in view of a speciﬁc target audience [10]. For example, the aim of viewers of documentaries is to received information, whereas the aim of feature ﬁlm viewers is entertainment. For this reason it is important that the abstracts diﬀer. A documentary abstract should give an overview of the contents of the entire video, whereas a feature ﬁlm abstract should be entertaining and not reveal the end of the story. Also, in movies, actors are important but not so much in documentaries. However, audio content is extremely important for A. Chauhan Æ S. Singh (&) Autonomous Technologies Research, Department of Computer Science, University of Exeter, Exeter, EX4 4QF, UK E-mail: s.singh@exeter.ac.uk D. Grosvenor Digital Media Department, Hewlett Packard Research Labs, Frenchay, Bristol, UK Pattern Anal Applic (2004) 7: 176–189 DOI 10.1007/s10044-004-0215-4