Video Splicing for Tune-in Time Reduction in IP Datacasting over DVB-H Miska M. Hannuksela Nokia Research Center Finland Miska.Hannuksela@nokia.com Mehdi Rezaei Tampere University of Technology Finland Mehdi.Rezaei@tut.fi Moncef Gabbouj Tampere University of Technology Finland Moncef.Gabbouj@tut.fi ABSTRACT A novel video splicing method is proposed which minimizes the tune-in time of “channel zapping”, i.e. changing from one audiovisual service to another, in IP datacasting (IPDC) over Digital Video Broadcasting for Handheld terminals (DVB-H). DVB-H uses a time-sliced transmission scheme enabling a receiver to turn radio reception off for those time-slices that are not of interest to the user and thus reducing the power consumption used for radio reception. One of the significant factors in tune-in time is the time from the start of media decoding to the start of correct output from decoding, which is minimized when a time-slice starts with a random access point picture such as an independent decoding refresh (IDR) picture in H.264/AVC. In IPDC over DVB-H, encapsulation to time-slices is performed independently from encoding in a network element called IP encapsulator. At the time of encoding, time-slice boundaries are typically not known exactly, and it is therefore impossible to govern the location of IDR pictures relative to time-slices. It is proposed that an additional stream consisting of IDR pictures only is transmitted to the IP encapsulator, which replaces pictures in a normal bit stream with IDR pictures according to time-slice boundaries in order to achieve the minimum tune-in delay. Replacing pictures causes a mismatch in the pixel values of the reference pictures between the encoder and decoder and the mismatch error is propagated in the reconstructed video. It has to be ensured that the propagated error is subjectively negligible. Furthermore, the “spliced” stream resulting from the operation of the IP encapsulator should comply with the Hypothetical Reference Decoder (HRD) specification of H.264/AVC. Error propagation caused by the proposed splicing method is analyzed and a video rate control system is proposed to satisfy the HRD requirements for the spliced stream. Simulation results show that in addition to fulfilling H.264/AVC compliancy, good average quality of decoded video is achieved with minimum tune-in delay and complexity. 1. INTRODUCTION DVB-H (Digital Video Broadcasting for Handheld terminals) is an ETSI standard specification for bringing broadcast services to battery-powered handheld receivers [1]. DVB-H is largely based on the successful DVB-T specification for digital terrestrial television, adding to it a number of features designed to take into account the limited battery life of small handheld devices, and the particular environments in which such receivers must operate. The use of time-slicing leads to significant power savings. DVB-H also employs additional forward error correction to further improve mobile and indoor reception performance of DVB- T. A simplified block diagram of a conventional IPDC system over DVB-H is depicted in Figure 1. As shown, a content encoder receives a source signal in analog format, uncompressed digital format, compressed digital format, or any combination of these formats. Content encoder encodes the source signal into a coded media bit stream. Content encoder may be capable of encoding more than one media type, such as audio and video. Alternatively, more than one content encoder may be required to code different media types of the source signal. Figure 1 illustraites the processing of one coded media bit stream of one media type. The coded media bit stream is transferred to a server. Examples of the format used in transmission include an elementary self-contained bit stream format, a packet stream format, or one or more coded media bit streams encapsulated into a container file. Content encoder and server may reside on the same physical device or may be included in separate devices. Content encoder and server may operate with live real-time content, in which case the coded media bit stream may not be stored permanently, but rather buffered for small periods of time in content encoder and/or in server to smooth out variations in processing delay, transfer delay, and coded media bit rate. Content encoder may also operate considerably earlier than when the bit stream is transmitted from the server. In such a case, the system may include a content database, which may reside on a separate device or on the same device as content encoder and/or server. The server may be an IP multicast server using real-time media transport over Real-time Transport Protocol (RTP). The server is configured to encapsulate the coded media bit stream into RTP packets according to an RTP payload format. Although not shown in this Figure, the system may contain more than one server. The server is connected to an IP encapsulator, also referred to as a multi-protocol