Efficient Real-Time Implementation of MPEG-4 Audiovisual Decoder Using DSP and RISC Chips Byeong-Doo Choi * , Kang-Sun Choi ** , Sung-Jea Ko * , Senior Member, IEEE, and Aldo W. Morales *** , Senior Member, IEEE * Department of Electronics Engineering, Korea University, Seoul, Korea ** Dalitech Co., Seoul, Korea *** Department of Electrical Engineering, Penn State University, Harrisburg, PA Email: sjko@dali.korea.ac.kr Abstract— This paper presents a real-time embedded im- plementation of MPEG-4 visual simple profile and AAC decoder using DSP and RISC chips. We optimize MPEG- 4 modules using the seamless double buffering memory structure, the dual MAC applied for half pixel motion compensation (MC), and fast floating-to-integer conversion algorithm in AAC. I. Introduction The latest MPEG-4 standard defines a standardized framework for different types of multimedia applications. The MPEG-4 video compression standard incorporats sev- eral error resilience tools in its simple profile to enable detection, containment, and concealment of errors [1]. Thus, MPEG-4 simple profile video is being used mainly for mobile video communications [2]. As a modern au- dio coding algorithm that is equipped with a number of coding tools, including joint stereo coding and different kinds of predictive coding, advanced audio coding (AAC) technology which most parts of the MPEG-4 audio coder are based on is being widely used. This paper presents an optimized real-time embedded implementation of MPEG-4 simple profile and AAC de- coder using latest DSP chips, TMS320C5510, and ARM9 RISC core [3]. To improve the computational efficiency of the MPEG-4 decoding system, we optimize MPEG-4 modules with various optimization techniques at assembly level. II. System Architecture MPEG-4 visual simple profile and AAC decoding re- quire a significant amount of memory, computation, and internal data tranfer, all of which impact the price and battery life of an embedded application. Since both latest DSP and RISC not only provide high performance, but also are optimized for power efficiency. Fig. 1 shows the block diagram of our MPEG-4 au- diovisual decoder and the developed prototype board, respectively. In Fig. 1(a), the ARM9 RISC processor ARM9 LAN Interface Unit LAN DSP (C5510) Flash H P I SDRAM MPEG-4 decoder Management Transmission (a) (b) Fig. 1. Proposed embedded MPEG-4 audiovisual decoder. (a) Block diagram. (b) Hardware implementation. in the management module receives the bitstream from the transmission module and transfers it into the shared memory. TMS320C5510 in the MPEG-4 decoding module decodes MPEG-4 video/audio and displays the decoded data. To communicate between DSP and RISC, a part of data memory in DSP is defined as registers and shared with RISC. Through the host-port interface (HPI) in DSP, RISC can directly access DSP’s memory space [4]. To speed up MPEG-4 audiovisual decoding, operations should utilize data in internal (on-chip) memory. We use the double buffering structure which moves data and executes MPEG-4 decoding at the same time without conflicting each other as shown in Fig. 2. At time t 1 , DMA moves a group of macroblock (MB) data aligned horizontally from the external RAM into the internal RAM, InBuff B, and the DSP core simultaneously decodes pre- transferred MB data in InBuff A. In order to use the maximum capacity of DSP, DMA moves OutBuff A into external memory at time t 2 , while CPU core executes macroblock decoding with InBuff B. Half pixel motion compensation (MC) is performed for 16x16 and for 8x8 vectors as well as for 16x8 field motion vectors in case of interlaced video. Fig. 3 shows the bilinear interpolation scheme to find values of the mid points l, m and the center point n. This procedure requires three additions and three multiplications corresponding to six cy-