HUMAN DETECTION IN IMAGES VIA L1-NORM MINIMIZATION LEARNING
Ran Xu
1
, Baochang Zhang
2
, Qixiang Ye
1
, Jianbin Jiao
1
1
Graduate School of Chinese Academy of Sciences, Beijing, China
2
School of Automation Science and Electrical Engineering, Beihang University, Beijing,China
+Corresponding Author: Fax: +86-10-88256278, Email: jiaojb@gucas.ac.cn
ABSTRACT
In recent years, sparse representation originating from
signal compressed sensing theory has attracted increasing
interest in computer vision research community. However,
to our best knowledge, no previous work utilizes L1-norm
minimization for human detection. In this paper we develop
a novel human detection system based on L1-norm
Minimization Learning (LML) method. The method is on
the observation that a human object can be represented by a
few features from a large feature set (sparse representation).
And the sparse representation can be learned from the
training samples by exploiting the L1-norm Minimization
principle, which can also be called feature selection
procedure. This procedure enables the feature
representation more concise and more adaptive to object
occlusion and deformation. After that a classifier is
constructed by linearly weighting features and comparing
the result with a calculated threshold. Experiments on two
datasets validate the effectiveness and efficiency of the
proposed method.
Index Terms—Human detection, L1-norm, feature
selection, sparse representation
1. INTRODUCTION
Feature representation and classifier are two basic elements
in a typical object detection algorithm. In the aspect of the
feature representation, various global and local methods are
widely investigated on human detection.
In [1], the global shape-based features are exploited for
body detection, the classification rule behind which is
actually based on the Chamfer distance. Compared to
global ones, the local features achieved much more
attention in recent years. In [2] the well-known overlapped
and dense local descriptor, histogram of oriented gradient
(HOG), is introduced for feature representation and trained
by a SVM classifier. Serre et al [3] utilize the cortex
features for object contour representation using the multi-
scale features of Gabor filters. In [4], the co-variance
feature is recently proposed and classified on a Riemannian
manifolds and achieves reasonable performance. Mu et al.
[5], employ improved LBP features, which have good
tolerance to color variance, for human detection. In addition,
some researchers detect human parts and combine these
features to form the overall human model [6-9]. Although
these features have succeeded in some detection tasks by
fusing with various classifiers, feature selection process,
which can further improve the representation effectiveness
and efficiency, is not fully investigated.
For the issue of constructing the classifier for human
detection, popular methods are SVM, Adaboost, etc.
Mohan et al. [10] adopt silhouette information to
representing human, exploiting SVM for final classification.
Viola et al. [11] employ Adaboost for face and human
classification based on the Haar-like features. In [12],
individual detectors based on the Shapelet features are
trained for each part using AdaBoost. However, in
accordance with above methods, SVM is a little complex
and not very effective for reducing time consuming. And
Adaboost needs extensive time to adjust every weak learner
as the number of samples and dimension of feature increase
[11] and extremely depends on large training set.
The proposed method in this paper is an effective way
to extract the compact feature representation, meanwhile
designing a linear classifier in a harmonious way for human
detection via L1-norm minimization. Sparse representation
using L1 minimization has been widely applied in to the
field on compression of signals [13-14]. And it has been
successfully used in the filed of face recognition [15].
Intuition lies in that the sparse representation is naturally
discriminative by L1-norm minimization which selects the
subset most compactly expressing the input signals. To
verify the performance of the proposed method, we exploit
the simple HOG descriptors to extract features. We firstly
compute blocks of HOG features on training samples and
use L1-minimization to obtain weight and the sparse
representation. Then, we design a simple but effective
linear classifier on these weighted features. It is also
investigated that the proposed method is robust to the
occlusion and multi-posture to some extent.
Fig.1. Framework of the proposed method.
3566 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010