VIDEO ENCODING AND SPLICING FOR TUNE-IN TIME REDUCTION IN IP
DATACASTING (IPDC) OVER DVB-H
1
Mehdi Rezaei,
2
Miska M. Hannuksela,
3
Moncef Gabbouj
1,3
Tampere University of Technology,
2
Nokia Research Center
ABSTRACT
A novel video encoding and splicing method is proposed which
minimizes the tune-in time of “channel zapping”, i.e. changing
from one audiovisual service to another, in IPDC over Digital
Video Broadcasting for Handheld terminals (DVB-H). DVB-H
uses a time-sliced transmission scheme to reduce the power
consumption used for radio reception. Tune-in time in DVB-H
refers to the time between the start of the reception of a broadcast
signal and the start of the media rendering. One of the significant
factors in tune-in time is the time from the start of media decoding
to the start of correct output from decoding, which is minimized
when a time-slice is started with a random access point picture
such as an independent decoding refresh (IDR) picture in
H.264/AVC. In IPDC over DVB-H, encapsulation to time-slices is
performed independently from encoding in a network element
called IP encapsulator. At the time of encoding, time-slice
boundaries are not known exactly, and it is therefore impossible to
govern the location of IDR pictures relative to time-slices. It is
proposed that an additional stream consisting of IDR pictures only
is transmitted to the IP encapsulator, which replaces pictures in a
normal bitstream with IDR pictures according to time-slice
boundaries in order to achieve the minimum tune-in time. It has to
be ensured that the “spliced” stream resulting from the operation of
the IP encapsulator complies with the Hypothetical Reference
Decoder (HRD) specification of H.264/AVC. A video encoding
and rate control system is proposed to satisfy the HRD
requirements for the spliced stream. Simulation results show that
in addition to fulfilling HRD compliancy, good average quality of
decoded video is achieved with minimum tune-in time.
1. INTRODUCTION
DVB-H (Digital Video Broadcasting for Handheld
terminals) is an ETSI standard specification for bringing
broadcast services to battery-powered handheld receivers
[1]. DVB-H is largely based on the successful DVB-T
specification for digital terrestrial television, adding to it a
number of features designed to take into account the limited
battery life of small handheld devices, and the particular
environments in which such receivers must operate.
In a conventional IPDC system over DVB-H, a content
encoder receives source signal and encodes the source
signal into a coded media bit stream. The coded media bit
stream is transferred to a server. The server is typically a
normal IP multicast server using real-time media transport
over RTP. The server encapsulates the coded media bit
stream into RTP packets. The server is connected to an IP
Multi-Protocol Encapsulator. The IP encapsulator
packetizes IP packets into Multi-Protocol Encapsulation
(MPE) Sections which are further encapsulated into MPEG-
2 Transport Stream (TS) packets. The IP encapsulator
optionally uses MPE Forward Error Correction (MPE-FEC)
based on Reed-Solomon (RS) codes. An IPDC system over
DVB-H further includes a radio transmitter which is not
essential for the operation of the proposed encoding and
splicing system and it is not discussed further.
To reduce the power consumption in handheld terminals,
the service data is time-sliced and then it is sent into the
channel as bursts at a significantly higher bit rate compared
to the bitrate of the audio-visual service. Time-slicing
enables a receiver to stay active only a fraction of the time,
while receiving bursts of a requested service. Finally, the
system includes one or more recipients, typically capable of
receiving, de-modulating, decapsulating, decoding, and
rendering the transmitted signal, resulting into
uncompressed media stream.
Tune-in time or delay in DVB-H refers to the time
between the start of the reception of a broadcast signal and
the start of the media rendering. The tune-in delay for
newly-joined recipients consists of several parts including:
delay until the start of the desired time-slice, reception
duration of a complete time-slice or MPE-FEC frame, delay
to compensate the size variation of MPE-FEC frames, delay
to compensate the synchronization between the associated
streams (e.g. audio and video) of the streaming session and
delay until a media decoder is refreshed by a random access
point to produce correct output samples. One of the critical
factors in tune-in delay is the time until a media decoder is
refreshed to produce correct output frames, which can be
minimized if MPE-FEC frame is started with a random
access point such as an IDR picture in H.264/AVC. It
should be remarked that if the decoder started decoding
from an IDR picture that is not at the beginning of a time-
slice immediately when the time-slice is received, the input
buffer for decoding would drain before the arrival of the
next time-slice and there would be a gap in video playback.
In IPDC over DVB-H, the content encoding and the
encapsulation to MPE-FEC frames are implemented
independently and it is hard to govern the exact location of
601 1424403677/06/$20.00 ©2006 IEEE ICME 2006