HUMAN DETECTION IN IMAGES VIA L1-NORM MINIMIZATION LEARNING Ran Xu 1 , Baochang Zhang 2 , Qixiang Ye 1 , Jianbin Jiao 1 1 Graduate School of Chinese Academy of Sciences, Beijing, China 2 School of Automation Science and Electrical Engineering, Beihang University, Beijing,China +Corresponding Author: Fax: +86-10-88256278, Email: jiaojb@gucas.ac.cn ABSTRACT In recent years, sparse representation originating from signal compressed sensing theory has attracted increasing interest in computer vision research community. However, to our best knowledge, no previous work utilizes L1-norm minimization for human detection. In this paper we develop a novel human detection system based on L1-norm Minimization Learning (LML) method. The method is on the observation that a human object can be represented by a few features from a large feature set (sparse representation). And the sparse representation can be learned from the training samples by exploiting the L1-norm Minimization principle, which can also be called feature selection procedure. This procedure enables the feature representation more concise and more adaptive to object occlusion and deformation. After that a classifier is constructed by linearly weighting features and comparing the result with a calculated threshold. Experiments on two datasets validate the effectiveness and efficiency of the proposed method. Index Terms—Human detection, L1-norm, feature selection, sparse representation 1. INTRODUCTION Feature representation and classifier are two basic elements in a typical object detection algorithm. In the aspect of the feature representation, various global and local methods are widely investigated on human detection. In [1], the global shape-based features are exploited for body detection, the classification rule behind which is actually based on the Chamfer distance. Compared to global ones, the local features achieved much more attention in recent years. In [2] the well-known overlapped and dense local descriptor, histogram of oriented gradient (HOG), is introduced for feature representation and trained by a SVM classifier. Serre et al [3] utilize the cortex features for object contour representation using the multi- scale features of Gabor filters. In [4], the co-variance feature is recently proposed and classified on a Riemannian manifolds and achieves reasonable performance. Mu et al. [5], employ improved LBP features, which have good tolerance to color variance, for human detection. In addition, some researchers detect human parts and combine these features to form the overall human model [6-9]. Although these features have succeeded in some detection tasks by fusing with various classifiers, feature selection process, which can further improve the representation effectiveness and efficiency, is not fully investigated. For the issue of constructing the classifier for human detection, popular methods are SVM, Adaboost, etc. Mohan et al. [10] adopt silhouette information to representing human, exploiting SVM for final classification. Viola et al. [11] employ Adaboost for face and human classification based on the Haar-like features. In [12], individual detectors based on the Shapelet features are trained for each part using AdaBoost. However, in accordance with above methods, SVM is a little complex and not very effective for reducing time consuming. And Adaboost needs extensive time to adjust every weak learner as the number of samples and dimension of feature increase [11] and extremely depends on large training set. The proposed method in this paper is an effective way to extract the compact feature representation, meanwhile designing a linear classifier in a harmonious way for human detection via L1-norm minimization. Sparse representation using L1 minimization has been widely applied in to the field on compression of signals [13-14]. And it has been successfully used in the filed of face recognition [15]. Intuition lies in that the sparse representation is naturally discriminative by L1-norm minimization which selects the subset most compactly expressing the input signals. To verify the performance of the proposed method, we exploit the simple HOG descriptors to extract features. We firstly compute blocks of HOG features on training samples and use L1-minimization to obtain weight and the sparse representation. Then, we design a simple but effective linear classifier on these weighted features. It is also investigated that the proposed method is robust to the occlusion and multi-posture to some extent. Fig.1. Framework of the proposed method. 3566 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010