AN OPTIMIZED LABEL-BROADCAST PARALLEL ALGORITHM FOR CONNECTED COMPONENTS LABELING Jo˜ ao Marcelo Xavier Nat´ ario Teixeira, Bernardo Reis, Veronica Teichrieb, Judith Kelner Virtual Reality and Multimedia Research Group, Informatics Center - Federal University of Pernambuco Av. Prof. Moraes Rego S/N, Pr´ edio da Positiva, 1 o Andar, Cidade Universit´ aria, 50732-970, Recife-PE Brasil {jmxnt, bfrs, vt, jk}@cin.ufpe.br ABSTRACT This paper presents a simple and fast algorithm for labeling connected components in binary images, based on a parallel label-broadcast paradigm. A grid of processing units (called spiders) is used and each element is responsible for updat- ing its label value, during a specific number of iterations. We describe the design and implementation of an embedded architecture for real-time labeling of black and white im- ages based on FPGA technology. Since the image is divided and processed independently by processing elements, it is possible to use the proposed algorithm in an FPGA platform attached to an image sensor and have a focal plane processor circuit-like. 1. INTRODUCTION Labeling algorithm is a procedure for assigning a unique la- bel to each object (a group of connected components) in an image [1]. This algorithm is used for any subsequent analy- sis procedure and for distinguishing and referencing the la- beled objects. Labeling is an indispensable part of nearly all applications in Pattern Recognition and Computer Vision. The efficiency of the connected component labeling algo- rithm is critical for many image processing and machine vi- sion applications that require real time response. Advances in the areas of parallel processing and VLSI (Very Large Scale Integration) technology can be exploited in designing hardware algorithms for high speed data throughput. In this work, a fast algorithm for labeling connected com- ponents in binary images, based on a parallel label-broadcast paradigm, is proposed. Since the labeling problem posesses both local and global features, there are many obstacles to be surpassed so that it enables the creation of a fully paral- lel approach. In this work, a grid of processing units equal to the number of pixels on the image is used and each el- ement is responsible for updating its label value, during a specific number of iterations. Such label-broadcast model was previously adopted by [2] and [3], but due to the con- nectivy scheme and processing, it takes too long to process a single image. This problem is diminished with our imple- mentation, since the processing elements (PEs) are directly connected. Our processing elements perform exactly the same com- putation, at the same time. The architecture and PEs have low complexity and can be implemented, for example, as a special purpose VLSI chip. The algorithm has a time com- plexity of 1, with a multiplicative factor of 0.5. Theoreti- cally, using a clock cycle of 15ns, the proposed hardware implementation is able to process a 128 × 128 image in 0.122865 milliseconds, using a grid of 128 ×128 processors. In order to validate the proposed algorithm, we performed two different implementations in an FPGA platform. This paper is organized as follows. Section 2 presents different research related to this work. The proposed algo- rithm is introduced in Section 3. Its FPGA implementation is described in Section 4. Strong and weak points regard- ing the algorithm are stated in Section 5. Finally, Section 6 draws some conclusions and new directions for this work. 2. RELATED WORK In general, sequential techniques are inefficient in terms of space and time requirements, whereas parallel algorithms are based on expensive general purpose parallel computation models. Sequential labeling approaches have reached their best performance results with Chang’s and Wu’s algorithms [1] [4]. Based on the labeling local and global features, different algorithmic techniques have been proposed to exploit such properties. Alnuweiri and Prasanna [5] characterize and sur- vey various parallel architectures and computation models that implement these techniques. Crookes and Benkrid [6] describe an architecture based on a serial, recursive algorithm for labeling. The algorithm iteratively scans the input image, performing a non-zero max- imum neighborhood operation, divided in two passes: a for- ward pass and an inverse one. The main disadvantage of this technique is that the time to label a whole image depends on 99 978-1-4244-6311-4/10/$26.00 ©2010 IEEE