Proc. EOS Conference on Industrial Imaging and Machine Vision (Munich, Germany, June 13-15), European Optical Society, 2005, 39-49. Architecture Study for Smart Cameras Harry Broers 1 , Wouter Caarls 2 , Pieter Jonker 2 , Richard Kleihorst 3 1 Philips Applied Technologies, Vision, Optics & Sensors, Eindhoven, 5600 MD, The Netherlands 2 Delft University of Technology, Quantitative Imaging, Delft, 2628 CJ, The Netherlands 3 Philips Research, Digital Design and Test, Eindhoven, 5656 AA, The Netherlands email: harry.broers@philips.com Summary Embedded real-time image processing is widely used in many applications, including industrial inspection, robot vision, photo-copying, traffic control, automotive control, surveillance, security systems and medical imaging. In these applications, the size of the image can be very large, the processing time should often be small and real-time constraints should be met. Starting in the 1980's, many parallel hardware architectures for low-level image processing have been developed. They range from frame-grabbers with attached Digital Signal Processors (DSPs), to systolic pipelines, square and linear single-instruction multiple-data stream (SIMD) systems, SIMD pyramids, PC-clusters, and since a number of years, smart cameras. As processors are becoming faster, smaller, cheaper, and more efficient, new opportunities arise to integrate them into a wide range of CMOS processor devices. Since there are so many different applications, there is no single processor that meets all the requirements all applications. The processing done on a smart camera has very specific characteristics. On one hand, low-level image processing operations such as interpolation, segmentation and edge enhancement are local, regular, and require vast amounts of bandwidth. On the other hand, high-level operations like classification, path planning, and control may be irregular, consuming less bandwidth. In this paper we will focus on the range of smart cameras of the Philips laboratories and its software support for easy programming that were partly developed in close co-operation with the SmartCam [1] project. 1 Introduction The SmartCam project investigates how a set of application-specific processors can be generated for intelligent cameras using design space exploration (the hardware framework), and how we are able to schedule the inherent data and task parallelism in an application in such a way, that a balance is found for both data and task parallel parts of the application software (the software framework). The found schedule is optimal for a certain architecture description. For the selection of the best architecture in combination with the best schedule, one can cycle through design space exploration and scheduling. Developing embedded parallel image processing applications is usually a very hardware-dependent process, requiring deep knowledge of the processors used. It is possible to explore parallelism along three axes: data-level parallelism (DLP), instruction-level parallelism (ILP) and task-level parallelism (TLP). Consequently one