Rate-distortion analysis of SP and SI frames Eric Setton, Prashant Ramanathan and Bernd Girod Information Systems Laboratory, Department of Electrical Engineering Stanford University, Stanford, CA 94305-9510, USA {esetton, pramanat, bgirod}@stanford.edu Abstract— SP and SI frames in the H.264 video coding stan- dard can be used for error resilience, bitstream switching or random access. Despite a widespread interest in these new types of frames, no work so far has investigated, in a systematic way, their rate-distortion efficiency. In this paper, we propose a model for the rate-distortion performance of SI and SP frames. A comparison to experimental results, obtained with our implementation of an SP encoder, confirms its validity. The model predicts how the relative sizes of SP and SI frames can be traded off. We analyze, both theoretically and experimentally, how this can be used to minimize the transmitted bit-rate when SP frames are used for video streaming with packet losses. I. I NTRODUCTION The design of the latest video coding standard, H.264 [1], reflects the increasing need for video streaming solutions which can adapt to varying network conditions. In addition to achieving superior coding efficiency, H.264 uses network- friendly syntax and incorporates several new encoding features which can be taken advantage of when designing flexible and adaptive streaming systems. The new picture types SP and SI are one of these features. SP/SI pictures are new types of predictively/intra coded pictures. Based on the seminal work by F¨ arber et al. [2], they were proposed in 2001 by Karczewicz and Kurceren, as a solution for error resilience, bitstream switching and random access [3], [4]. They are now part of the Extended Profile of H.264. Their main advantage is that they can be reconstructed exactly by using different sets of predictors or no predictor at all. This allows drift-free bitstream switching applications such as refreshing a prediction chain or switching between different quality streams as depicted in Fig. 1 and Fig. 2. Fig. 1. SI frames share the instant refresh properties of I frames but are only sent after a frame is lost. Despite a widespread interest in SP and SI frames, no work so far has addressed the question of how efficient SI and SP frames are, and how their relative sizes can be traded off. This Fig. 2. Switching SP frames allow to switch streams using predictive frames only. is, in part, due to the fact that no reference implementation of an SP encoder has been provided to the community. The purpose of this work is to address these questions by proposing a model for the rate-distortion functions of SP and SI frames. The model is used to analyze the properties of these pictures and derive optimal settings for their encoding. In the next section, we describe the encoding of SP and SI frames. In Section III, we propose a model of the rate- distortion performance of SP and SI frames and compare it to experimental results. The model predicts how the relative sizes of SP and SI frames can be traded off. We analyze, in Section IV, both theoretically and experimentally, how this can be used to minimize the transmitted bit-rate when SP frames are used for video streaming with packet losses. II. ENCODING OF SP AND SI FRAMES Predictively encoded P frames can only be reconstructed exactly when their set of predictors is decoded correctly. To alleviate this requirement, a non-switching (also called primary) SP frame may be inserted in the bitstream as shown at the top of Fig. 1. Along with this non-switching SP frame, a corresponding SI frame may be created. The SI frame can be decoded without any predictor and will correspond exactly to the initial primary SP frame. Switching SP frames have also been included in the H.264 standard. They allow to reconstruct exactly a primary SP frame given another set of reference frames; they will be analyzed in more detail in a future paper. The diagram of a primary SP frame encoder is shown in Fig. 3. It is mainly composed of a traditional video encoder followed by an additional intra-frame encoder 1 which oper- 1 We call intra-frame encoder the combination of a spatial transform followed by quantization. In Fig. 3, we show next to this block a symbol (either QPSP or QPSP2) representing the value of the quantizer.