Published in IET Image Processing Received on 15th November 2009 Revised on 7th May 2010 doi: 10.1049/iet-ipr.2009.0374 ISSN 1751-9659 New method for the fusion of complementary information from infrared and visual images for object detection I. Ulusoy 1 H. Yuruk 1,2 1 Computer Vision and Intelligent Systems Research Laboratory, Electrical and Electronics Engineering Department, METU, 06531 Ankara, Turkey 2 ASELSAN Inc., P.K. 1, 06172 Yenimahalle, Ankara, Turkey E-mail: ilkay@metu.edu.tr Abstract: Visual and infrared cameras have complementary properties and using them together may increase the performance of object detection applications. Although the fusion of visual and infrared information results in a better recall rate than using only one of those domains, there is always a decrease in the precision rate whereas the infrared domain on its own always has higher precision. Thus, the fusion of these domains is meaningful only for a better recall rate, which means that more foreground pixels are detected correctly. This study presents a new computationally more efﬁcient and simpler method for extracting the complementary information from both domains and fusing them to obtain better recall rates than those previously achieved. The method has been tested using a well-known database and a database created for the study and compared with earlier fusion methods. 1 Introduction Colour or greyscale video cameras that produce visible spectrum images need external illumination. Infrared cameras can be used both day and night to produce infrared spectrum images but they lack some information such as texture and colour. Since visual and infrared cameras have complementary properties, using them together may increase the performance of object detection applications. However, each domain, whether infrared or visual, has speciﬁc issues such as sudden illumination changes, presence of shadows and poor nighttime visibility that causes problems in the visual images. Infrared imagery also has problems such as lower signal-to-noise ratio (SNR), polarity inversion and the halo effect that appears around hot or cold objects. Therefore the main issue is how these domains can be fused. Precision and recall are the metrics used to measure the performance of object detection applications. Precision is the number of object pixels detected by the method divided by the total number of pixels detected by the method. It determines the ratio of correct object pixels to the detected pixels. Recall is the number of object pixels detected by the method divided by the total number of actual object pixels. It measures the ratio of the detected object pixels to the actual object pixels. The fusion of visual and infrared information is important in increasing the recall rate of object detection but this fusion always decreases the precision rate. Since infrared images effectively provide the foreground information, most of the detected pixels are the object pixels; thus the precision is very high when only infrared domain is used. However, foreground detection in visual images is usually poor and thus, the precision is very low. Therefore when these domains are fused, the precision drops. Recall rates are always very low for both the infrared and the visual domains. However, when they are fused, a considerable increase in the recall rate can be achieved. In [1], the precision and recall rates were higher at 0.939 and 0.645%, respectively, when only infrared image was used, than the corresponding rates of 0.498 and 0.233%, respectively, when only visual image was used. Also, when these domains were fused, the precision dropped below the infrared precision rate to 0.916%; however, the recall increased to 0.722% which is above the infrared recall value. Therefore it is clear that the advantage of fusion is an increase in the recall rate. This increase can be considerable if fusion is carried out appropriately. In this study, beneﬁcial information from both visual and infrared images is combined and a signiﬁcant increase in the recall rate is achieved after fusion. Normally, the precision of the infrared domain is very high that means that most of the detected foreground pixels are object pixels. However, when a foreground object is in front of a background object, which has a thermal value similar to the foreground object, then some of the pixels from the foreground object are not detected. For example, in Fig. 1, the thermal image values in the abdominal part of people are similar to the value of the road, which is why these regions of the people are not detected as foreground (Figs. 1b and d ). This type of undetected regions can be completed if the foreground is 36 IET Image Process., 2011, Vol. 5, Iss. 1, pp. 36–48 & The Institution of Engineering and Technology 2011 doi: 10.1049/iet-ipr.2009.0374 www.ietdl.org