New Macroblock Engine Architecture for Video Processing Trio ADIONO 1 , Dani Fitriyanto 1 , Akhmad Mulyanto 1 , Sumek Wisayataksin 2 , Kazumasa Takeichi 2 , Dongju Li 2 , Tati R. Mengko 1 , Hiroaki KUNIEDA 2 1) Department of Electrical Engineering, Bandung Institute of Technology, Indonesia 2) Department of Communications and Integrated Systems, Tokyo Institute of Technology, Japan email : tadiono@paume.itb.ac.id Abstract A new engine for macro-block based video processing is introduced in this paper. This engine increases efficiency, flexibility and extensibility of data generation for macro- block based video processing system. In the proposed system, a new specific instruction sets that can access data in pixel, line, block, macro-block or frame within a clock cycle are introduced. Thus, efficiency of video processing system is increased. Additionally, the programmability of data access enables dynamic scheduling for various video processing applications. It extremely reduces processing time while reducing the control complexity as well. This architecture also has scalability for different size of image and has expandability for new macro-block based processor. Implementation to typical video compression application shows high performance result and easy system implementation. 1. Introduction Most video processing algorithm involves processing a pixel with its neighbor pixels. In video compression algorithm, in order to exploit temporal and spatial redundancy, most of algorithms do processing to groups of pixels, called block and macro-block processing. In conventional system, dedicated hardware based system implementation is usually designed for each macro-block processing unit, such as DCT, IDCT, ME, MC, IQ and Q. This approach requires large number of logic gates which increases design complexity. Moreover, dedicated design has very small flexibility for design scalability and expandability. On the other hand, general purpose based video coding implementation requires computation overhead for macro- block based memory data accessed. Consequently, high clock frequency and high power consumption are required. These are not suitable for typical video processing application that needs low power features, and large computation times. To overcome problems mentioned above, we introduce a new engine which is functioning to generate data for many types of image processing modules. This proposed engine provides easy access to memory data inside the frame store memory of video processing systems. Processor can access data in macro-block, block or pixel. Due to its programmability features, we can also program processing element function according to required processing that may vary according to the content of video image data. Thus, we can employ dynamic scheduling for video coding system, based on video content that may increase computation efficiency. 2. Macroblock Engine System Architecture In order to obtain high performance system architecture, the addressing of frame store memory method is firstly optimized to provide easy access of memory data. Then instruction sets are design to enable data read/write in various ways. Finally, considering both addressing method and instruction sets, the system architecture is designed. 2.1 Addressing method To simplify and increase address generation speed, we arrange data inside memory in macro-block based system. We use pixel position inside block (p_x and p_y), block number (b_x and b_y), macro-block number (mb_x and mb_y) and frame number (fr) as address value, as illustrated in Figure 2.1 and Figure 2.2. As a result, we can directly map real macro-block address using its position parameters (fr, mb_y, mb_x, b_y, b_x, p_y, p_x) as shown in Figure 2.2. Figure 2.1 Pixel and Block position inside the macro- block addressing MVA2005 IAPR Conference on Machine VIsion Applications, May 16-18, 2005 Tsukuba Science City, Japan 3-13 68