7.4-5 Abstract—We present a quantative performance evaluation of several components of a video format conversion algorithm (referred to as Natural Motion (NM)). The implementation platform is a new programmable media-processor, the TM3270, combined with dedicated hardware support. The performance of two compute-intense NM components, Motion Estimation (ME) and Temporal Up-conversion (TU), is evaluated. The impact of new TM3270 features, such as new video-processing operations and data prefetching, is quantified. We show that a real-time implementation of the ME and TU algorithms is achievable in a fraction of the available compute performance, when operating on standard definition video. I. INTRODUCTION The Natural Motion (NM) video format conversion algorithm addresses the increasing need for video format conversion (Figure 1). Both spatial and temporal format conversions may be required to convert the source video stream format to that of the display. Until recently, these conversions had a dedicated hardware implementation. However, increased performance of media-processors has enabled software implementations [1][2]. Software implementations provide flexibility, which can be exploited in different ways; e.g. multiple algorithms can be mapped onto the same implementation platform, algorithmic changes without modifications to the implementation platform, a higher level of algorithmic adaptation to video content, etc. These advantages have led to a co-existence of dedicated hardware and software implementation platforms. In this paper we evaluate the performance of two compute- intensive components of the NM algorithm on the new TM3270 media-processor. II. ALGORITHMS Motion Estimation (ME) and Temporal Up-conversion (TU) are two components of the NM algorithm that can benefit from a software implementation: algorithmic innovations in both areas have led to continuous quality improvements, and a software implementation allows us to apply these innovations with a short time-to-market. It is not the intend of this paper to present best-in-class ME and TU algorithms, but rather to evaluate the performance of an algorithm that is representative for its class. For ME, we use the 3-D Recursive Search (3DRS) block-based motion estimator [3]. Our version of the 3DRS estimator performs a motion search using a set of 11 candidate motion vectors for each 8x8 block of image pixels. It provides a high quality result at a low computational complexity. The motion search range has a horizontal search range of the video image width, and a vertical range of [-40, 39 ¾] pixels. For TU, we use an enhanced version of the motion-compensated cascaded median up-converter [4]. Our version performs up-conversion on 4x4 blocks of image pixels. The 4x4 block motion vectors are derived through block erosion from the 8x8 block motion vectors as determined by the ME algorithm. Both algorithms operate on ¼ pixel accuracy; non-integer image pixels are calculated through bi-linear interpolation using the fractional pixel position offsets as weighing factors. III. SOFTWARE IMPLEMENTATION PLATFORM The ME and TU algorithms were implemented on the TM3270 media-processor. The TM3270 is a five issue slot VLIW processor; i.e. every cycle, up to five independent operations can be started. Using the SIMD capabilities of the 32-bit datapaths, a single operation is performed on either one 32-bit, two 16-bit, or four 8-bit sub-operands. In other words each cycle, up to 5 * 4 = 20 8-bit operations can be performed (five issue slots, 8-bit SIMD parallelism). At a 450 MHz operating frequency, this results in a maximum computational performance of 450 MHz * 20 = 9 Gops/sec (note that these operations can be relative complex, e.g. 3-taps median operations). The processor has a 64 Kbyte instruction and a 128 Kbyte data cache. The TM3270 has several new features, when compared to other media-processors. Most notable are data prefetching and new three- and four-input video-processing operations. Data prefetching anticipates the future use of data by retrieving it from main memory into the processor’s data cache, before its actual use by the processor. As a result, the data is available Motion Estimation and Temporal Up-Conversion on the TM3270 Media-Processor Jan-Willem van de Waerdt 1,2 , Stamatis Vassiliadis 2 , Erwin Bellers 1 and Johan G.W.M. Janssen 1 1 Philips Semiconductors, San Jose, CA, USA 2 Delft University of Technology, Delft, The Netherlands NATURAL MOTION MOTION-COMPENSATED DE-INTERLACING video in FILM DETECTION MOTION ESTIMATION MOTION COMPENSATION MOTION- COMPENSATED TEMPORAL UP-CONVERSION DE-INTERLACING & SCALING video out DEDICATED HARDWARE SOFTWARE Fig. 1. Natural Motion (NM) algorithm, and its main components.