68 International Journal of Computer Vision and Image Processing, 4(2), 68-79, April-June 2014
Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
ABSTRACT
A complete overview of key frame extraction techniques has been provided. It has been found out that such
techniques usually have three phases, namely shot boundary detection as a pre-processing phase, main phase
of key frame detection, where visual, structural, audio and textual features are extracted from each frame, then
processed and analyzed with artiicial intelligence methods, and the last post-processing phase lies in removal
of duplicates if they occur in the resulting sequence of key frames. Estimation techniques and available test
video collections have been also observed. At the end, conclusions concerning drawbacks of the examined
procedure and basic tendencies of its development have been marked.
Key Frame Extraction
from Video:
Framework and Advances
Sergii Mashtalir, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
Olena Mikhnova, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
Keywords: Duplicate Removal, Feature Extraction, Key Frame, Shot Boundary, Video Summarization
INTRODUCTION
The importance of key frame extraction has
immensely increased when intelligent access
to multimedia started to gain large popularity.
Key frame extraction can be used as a basis for
enhancing indexing and searching capabilities.
It is also a main tool for video summarization,
which allows users quickly to get acquainted
with multi-hour video content. For the sum-
marization purposes, any video can be decom-
posed into a sequence of images, audio track,
and textual part. Each unit is very essential
for processing, but we shall focus merely on a
sequence of images. On the contrast to video
skimming, where initial material is shortened
into a dynamic representative clip, summariza-
tion assumes extraction of static meaningful
frames which are selected by chosen features
and analyzed by intelligent methods.
Two types of key frames can be selected:
least common content (Yang & Wei, 2011) and
best representatives (Fayka et al., 2010). The
type of key frames usually depends on the type
of potential content to be analyzed. If a video
has a variety of scenes and great variance of
feature data, then best representatives are better
to be extracted, otherwise, if video content is
very similar, the results would be much better
when least common frames are chosen. It is
also thought that longer shots have more im-
portance compared with shorter ones, as they
attract users’ attention much longer. Frames that
appear earlier in a timeline are also considered
of greater importance, compared with similar
frames appeared later (Yang & Wei, 2011).
DOI: 10.4018/ijcvip.2014040105