68 International Journal of Computer Vision and Image Processing, 4(2), 68-79, April-June 2014 Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. ABSTRACT A complete overview of key frame extraction techniques has been provided. It has been found out that such techniques usually have three phases, namely shot boundary detection as a pre-processing phase, main phase of key frame detection, where visual, structural, audio and textual features are extracted from each frame, then processed and analyzed with artiicial intelligence methods, and the last post-processing phase lies in removal of duplicates if they occur in the resulting sequence of key frames. Estimation techniques and available test video collections have been also observed. At the end, conclusions concerning drawbacks of the examined procedure and basic tendencies of its development have been marked. Key Frame Extraction from Video: Framework and Advances Sergii Mashtalir, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine Olena Mikhnova, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine Keywords: Duplicate Removal, Feature Extraction, Key Frame, Shot Boundary, Video Summarization INTRODUCTION The importance of key frame extraction has immensely increased when intelligent access to multimedia started to gain large popularity. Key frame extraction can be used as a basis for enhancing indexing and searching capabilities. It is also a main tool for video summarization, which allows users quickly to get acquainted with multi-hour video content. For the sum- marization purposes, any video can be decom- posed into a sequence of images, audio track, and textual part. Each unit is very essential for processing, but we shall focus merely on a sequence of images. On the contrast to video skimming, where initial material is shortened into a dynamic representative clip, summariza- tion assumes extraction of static meaningful frames which are selected by chosen features and analyzed by intelligent methods. Two types of key frames can be selected: least common content (Yang & Wei, 2011) and best representatives (Fayka et al., 2010). The type of key frames usually depends on the type of potential content to be analyzed. If a video has a variety of scenes and great variance of feature data, then best representatives are better to be extracted, otherwise, if video content is very similar, the results would be much better when least common frames are chosen. It is also thought that longer shots have more im- portance compared with shorter ones, as they attract users’ attention much longer. Frames that appear earlier in a timeline are also considered of greater importance, compared with similar frames appeared later (Yang & Wei, 2011). DOI: 10.4018/ijcvip.2014040105