REGULAR PAPER Video summarization using a network of radial basis functions Naveed Ejaz • Sung Wook Baik Received: 27 July 2011 / Accepted: 6 March 2012 / Published online: 3 May 2012 Ó Springer-Verlag 2012 Abstract The exponential increase of video data on the internet demands efficient management schemes for storage, retrieval, and indexing. One of the methods for managing this huge volume of video data is video summarization. Video summarization is a method to generate smart versions of the videos for efficient retrieval and browsing on the internet. Key frame extraction is a type of video summari- zation in which the video contents are represented by salient frames of the video. Most of the applications of key frames are user-centered whereby the key frames are used to assist human users in video browsing. However, most of the existing techniques for extracting key frames do not encompass users’ feedback in the retrieval of key frames. In this paper, we propose a user-centered scheme for extracting key frames. In our scheme, the system parameters are learned at training time in the light of users’ feedback. For successful modeling of human perception of similarity in k-means clustering, a non-linear model based on a network of Radial basis functions is employed to reduce the semantic gap between index features and human perception. Experimental results show that the proposed scheme gives excellent results as compared to some of the other techniques. Keywords Key frame extraction Video summarization Static video summary Radial basis functions Video summary evaluation Video retrieval 1 Introduction In the last few years, there has been an exponential increase in the amount of video data on the internet. This drastic increase has been driven by many factors including increased processing power, faster networks, and cheaper storage devices [1]. This situation demands efficient tech- niques for video data management which involves acqui- sition, archiving, indexing, and retrieval of video data. One of the methods to encounter these needs is to generate video summaries (or video abstracts). Video summariza- tion provides succinct versions of the videos by preserving only significant contents of the videos [2]. Usually, there is a lot of redundancy in a video which can be removed to generate video summaries. The primary application of video summaries is to endow browsing capabilities in a video database. In this way, the users can have a glance on the summarized content before watching the entire video. The video summaries can also provide navigation facilities to users by enabling them to quickly access a relevant spot in a video. Video summaries can also act as a pre-pro- cessing step for many video content analysis techniques like object detection and segmentation, etc. Video summaries can be produced in many different forms. However, the two basic forms of generating sum- maries are Key Frame Extraction and Video Skimming [1]. The key frames (also called representative frames or static storyboards) are a collection of salient frames extracted from a video. In this paper, the terms key frames and storyboards are used interchangeably. Video skims Communicated by T. Haenselmann. N. Ejaz S. W. Baik (&) College of Electronics and Information Engineering, Sejong University, Seoul, Korea e-mail: sbaik@sejong.ac.kr N. Ejaz e-mail: naveed@sju.ac.kr 123 Multimedia Systems (2012) 18:483–497 DOI 10.1007/s00530-012-0263-3