Image and video processing on FPGAs: An Exploration Framework for Real-Time Applications Paulo da Cunha Possa*. Zied El Hadhri**. Carlos Valderrama*** *University of Mons, Department of Electronics and Microelectronics, Mons, 7000 Belgium (Tel: +32-6537-4222; e-mail: Paulo.Possa@umons.ac.be). **(e-mail: Zied.ELHADHRI@student.umons.ac.be) ***(e-mail: Carlos.Valderrama@umons.ac.be) Abstract: This work presents a design framework for real-time image and video processing enabling exploration and evaluation of different processing techniques. The goal of our educational approach is to develop a flexible and easily customizable environment for prototyping different processing techniques on Field Programmable Gate Arrays (FPGAs), targeting specific applications. In this paper we give an overview of different requirements and techniques of video processing featuring FPGAs. Three real-time video processing algorithms were combined to show the advantages and characteristics of our approach. Within the framework, the modules running in parallel can be easily swapped at run-time according to the application specific needs. Keywords: Video processing, field programmable gate array, embedded systems, hardware description language, object tracking, pattern detection. 1. INTRODUCTION Innovations regarding video/image processing and a fast evolution of video standards, such as digital cinema and HDTV, are now taking place. The progress on image capture and display resolution, advanced compression techniques, and smart cameras are the main reasons that are pushing the limits of technology and processing power (Altera, 2007; Bodba, 2007). Video resolution requirements rose significantly over the last few years. Moving from standard definition (SD) to high definition (HD), with a resolution exceeding 2MP (Megapixel) and a refresh rate between 30 to 60 frames per second, represents a 5.5x increase in processed data. The new video surveillance standards impose a change from Common Intermediate Format (CIF – 352 × 288 pixels per frame) to D1 format (704 × 576), with some industrial cameras providing HD at 1280 × 720. Other video applications like military surveillance, medical imaging, and machine vision are also processing very high resolution images (Altera, 2007; Poynton, 2003). The new generation of advanced compression techniques reinforces the streaming capability, the compression rate for a given quality, and reduces latency. With an improved resolution and increased compression ratio, there is a crucial need of processing power, all by keeping the architecture rather flexible to follow the last upgrade of the standard. (Altera, 2007). Digital Signal Processors (DSPs), Application-specific Integrated Circuits (ASICs) and Graphic Processor Units (GPUs) are the platforms commonly used to implement image and video processing algorithms requiring simultaneous computations on multiple pixels/frames. Looking into the architecture, a typical TI DSP processor may have two Arithmetic Logic Units (ALUs) to carry out Multiply & Accumulate (MAC) operations. Comparatively, an FPGA can have more than 200 MAC blocks processing pixels in parallel. Some FPGAs now have dedicated hard- core DSP/MAC blocks for faster processing power (Altera, 2007; Kalomiros and Lygouras, 2008). FPGAs hold a clear advantage compared to conventional DSPs to perform digital signal processing which is their scalability (the capacity to replicate functions as required) and inherent parallelism. With the arrival of the new age of FPGAs, as a mature technology with increased volume, the growing need for faster and cost-effective systems was overcome (Kalomiros and Lygouras, 2008; Serrano, 2008). The concept of latency becomes critical with real-time processing applications like video or television signal processing. Therefore, in addition to embedded hardware multipliers and a larger number of memory blocks, computationally demanding functions (e.g. convolution filters, motion estimators, two-dimensional Discrete Cosine Transforms (2D DCTs), and Fast Fourier Transforms (FFTs)) are better provided in the form of specialized hardware components available in modern embedded multimedia architectures. Such components are also available as Intellectual Property (IP) cores to develop FPGA-based applications (Kalomiros and Lygouras, 2008). Nowadays, students should evolve from software development to architecture design in order to satisfy such requirements. They must master algorithmic selection and