Journal of Imaging Science and Technology R  63(2): 020503-1–020503-13, 2019. c  Society for Imaging Science and Technology 2019 Multi-Hypothesis Approach for Eﬃcient Human Detection Hussin Ragb and Vijayan Asari University of Dayton, Vision Lab, Electrical and Computer Eng., 300 College Park, Dayton, OH, USA 45469-0232 E-mail: ragbh1@udayton.edu Abstract. Detection of human beings in a complex background environment is a challenging task in computer vision. Most of the time no single feature algorithm is rich enough to capture all the relevant information available in the image. In this paper, we propose a new feature extraction technique that combines three types of visual information; shape, color, and texture, and is named as the Color space Phase features with Gradient and Texture (CPGT ) algorithm. Gradient concept and the phase congruency in color domain are used to localize the shape features. The Center-Symmetric Local Binary Pattern (CSLBP) approach is used to extract the texture information of the image. Fusing of these complementary features yields to capture a broad range of the human appearance details that improves the detection performance. The proposed features are formed by computing the gradient magnitude and CSLBP values for each pixel in the image with respect to its neighborhood in addition to the phase congruency of the three-color channels. Only the maximum phase congruency magnitudes are selected from the corresponding color channels. The histogram of oriented phase and gradients as well as the histogram of CSLBP values for the local regions of the image are determined and concatenated to construct the proposed descriptor. Principal Component Analysis (PCA) is performed to reduce the dimensionality of the resultant features. Several experiments were conducted to evaluate the performance of the proposed descriptor. The experimental results show that the proposed approach yields promising performance and has lower error rates when compared to several state of the art feature extraction methodologies. We observed a miss rate of 2.23% in the INRIA dataset and 2.6% in the NICTA dataset. c  2019 Society for Imaging Science and Technology. [DOI: 10.2352/J.ImagingSci.Technol.2019.63.2.020503] 1. INTRODUCTION Computer vision can be defined as ‘‘the theory and technol- ogy for building artificial systems that obtain information from images or multi-dimensional data.’’ It uses cameras attached to the computer to automatically interpret images trying to understand their content similar to the human vision. Computer vision methods can discover from the images what are the object present in the scene (object recognition/classification), where they are (object detection), how they move (object tracking), and what is their shape (object reconstruction). Human detection is one of the most active research topics, and demanding applications of computer vision. It can be stated simply as the localization of the regions in an image or video sequence containing IS&T Member. Received Aug. 10, 2017; accepted for publication May 11, 2018; published online Jan. 15, 2019. Associate Editor: Jia-Shing Sheu. 1062-3701/2019/63(2)/020503/13/$25.00 humans. Some of the tasks that fall under this domain are the human computer interaction, person identification, event detection, counting people in crowded regions, gender classification, automatic navigation, safety systems, etc. The fluctuating appearance of the human body combined with the occlusions, cluttered scenes and illumination changes, make the human detection task as one of the challenging categories in object detection. The human detection system is mainly consisting of two major procedures: feature extraction and classification as illustrated in Figure 1. The feature extraction algorithm is used to encode the image regions as low dimensional feature vectors that support high accuracy human/non-human decisions. These features should characterize the image suﬃciently well for object detection or classification, while providing robustness and invariance to the changes in illumination, viewpoint, and shifts in object contours. Such features can be based on intensities, gradients, texture, color, or combinations of several or all of these. The classifier unit uses these extracted features to determine whether the image region belongs to the object of interest or not. Human detection system is implemented in two stages: the learning phase (oﬀ-line) and the runtime phase (on-line). Building the human detector is carried out in the learning phase by extracting the features of the positive (human), and negative (non-human) samples in the training dataset. This information is converted into fixed dimension feature vectors and used for training the classifier. In the runtime, all regions in the input image are evaluated by using a sliding window that is gradually moved along diﬀerent positions of the image. The features of every window region are extracted and analyzed. Then a classifier can decide to which class the window region belongs. Once the region is classified, the procedure of the current window is finished, and the evaluation continues with the next region in the image. Over the past decade, significant results have been reported by numerous researchers in the area of human detection. This task has tackled with diﬀerent and diverse techniques. One of these techniques is the Holistic approach, which is a training method proposed to classify and identify the full-human body as a single detection region. Parts-based method is another approach proposed to identify each part of the body separately, and the human can be detected if some or all these parts are presented in a reasonable spatial configuration. At the same time, the testing phase has also carried out in several aspects. Sliding window is one approach that is used to scan the image densely at diﬀerent J. Imaging Sci. Technol. 020503-1 Mar.-Apr. 2019