VIDEO ENCODING AND SPLICING FOR TUNE-IN TIME REDUCTION IN IP DATACASTING (IPDC) OVER DVB-H 1 Mehdi Rezaei, 2 Miska M. Hannuksela, 3 Moncef Gabbouj 1,3 Tampere University of Technology, 2 Nokia Research Center ABSTRACT A novel video encoding and splicing method is proposed which minimizes the tune-in time of “channel zapping”, i.e. changing from one audiovisual service to another, in IPDC over Digital Video Broadcasting for Handheld terminals (DVB-H). DVB-H uses a time-sliced transmission scheme to reduce the power consumption used for radio reception. Tune-in time in DVB-H refers to the time between the start of the reception of a broadcast signal and the start of the media rendering. One of the significant factors in tune-in time is the time from the start of media decoding to the start of correct output from decoding, which is minimized when a time-slice is started with a random access point picture such as an independent decoding refresh (IDR) picture in H.264/AVC. In IPDC over DVB-H, encapsulation to time-slices is performed independently from encoding in a network element called IP encapsulator. At the time of encoding, time-slice boundaries are not known exactly, and it is therefore impossible to govern the location of IDR pictures relative to time-slices. It is proposed that an additional stream consisting of IDR pictures only is transmitted to the IP encapsulator, which replaces pictures in a normal bitstream with IDR pictures according to time-slice boundaries in order to achieve the minimum tune-in time. It has to be ensured that the “spliced” stream resulting from the operation of the IP encapsulator complies with the Hypothetical Reference Decoder (HRD) specification of H.264/AVC. A video encoding and rate control system is proposed to satisfy the HRD requirements for the spliced stream. Simulation results show that in addition to fulfilling HRD compliancy, good average quality of decoded video is achieved with minimum tune-in time. 1. INTRODUCTION DVB-H (Digital Video Broadcasting for Handheld terminals) is an ETSI standard specification for bringing broadcast services to battery-powered handheld receivers [1]. DVB-H is largely based on the successful DVB-T specification for digital terrestrial television, adding to it a number of features designed to take into account the limited battery life of small handheld devices, and the particular environments in which such receivers must operate. In a conventional IPDC system over DVB-H, a content encoder receives source signal and encodes the source signal into a coded media bit stream. The coded media bit stream is transferred to a server. The server is typically a normal IP multicast server using real-time media transport over RTP. The server encapsulates the coded media bit stream into RTP packets. The server is connected to an IP Multi-Protocol Encapsulator. The IP encapsulator packetizes IP packets into Multi-Protocol Encapsulation (MPE) Sections which are further encapsulated into MPEG- 2 Transport Stream (TS) packets. The IP encapsulator optionally uses MPE Forward Error Correction (MPE-FEC) based on Reed-Solomon (RS) codes. An IPDC system over DVB-H further includes a radio transmitter which is not essential for the operation of the proposed encoding and splicing system and it is not discussed further. To reduce the power consumption in handheld terminals, the service data is time-sliced and then it is sent into the channel as bursts at a significantly higher bit rate compared to the bitrate of the audio-visual service. Time-slicing enables a receiver to stay active only a fraction of the time, while receiving bursts of a requested service. Finally, the system includes one or more recipients, typically capable of receiving, de-modulating, decapsulating, decoding, and rendering the transmitted signal, resulting into uncompressed media stream. Tune-in time or delay in DVB-H refers to the time between the start of the reception of a broadcast signal and the start of the media rendering. The tune-in delay for newly-joined recipients consists of several parts including: delay until the start of the desired time-slice, reception duration of a complete time-slice or MPE-FEC frame, delay to compensate the size variation of MPE-FEC frames, delay to compensate the synchronization between the associated streams (e.g. audio and video) of the streaming session and delay until a media decoder is refreshed by a random access point to produce correct output samples. One of the critical factors in tune-in delay is the time until a media decoder is refreshed to produce correct output frames, which can be minimized if MPE-FEC frame is started with a random access point such as an IDR picture in H.264/AVC. It should be remarked that if the decoder started decoding from an IDR picture that is not at the beginning of a time- slice immediately when the time-slice is received, the input buffer for decoding would drain before the arrival of the next time-slice and there would be a gap in video playback. In IPDC over DVB-H, the content encoding and the encapsulation to MPE-FEC frames are implemented independently and it is hard to govern the exact location of 601 1424403677/06/$20.00 ©2006 IEEE ICME 2006