INCREASING VIRTUAL SAMPLES THROUGH LOSS SMOOTHNESS DETERMINATION IN
LARGE GEOMETRIC MARGIN MINIMUM CLASSIFICATION ERROR TRAINING
Tsukasa Ohashi
1
, Hideyuki Watanabe
2
, Jun’ichi Tokuno
1
, Shigeru Katagiri
1
, Miho Ohsaki
1
,
Shigeki Matsuda
2
, and Hideki Kashioka
2
1
Graduate School of Engineering, Doshisha University
1-3 Tatara Miyakodani, Kyotanabe-shi, Kyoto 610-0394, Japan.
2
National Institute of Information and Communications Technology
3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0289, Japan.
E-mail: hideyuki.watanabe@nict.go.jp
ABSTRACT
We propose a new method for automatically determining the
smoothness of smooth classification error count loss for the recent
Large Geometric Margin Minimum Classification Error (LGM-
MCE) training. The method uses the Parzen-estimation-based
formalization of MCE training, and it realizes the determination
through the maximum likelihood estimation of error count risk in
the one-dimensional geometric-margin-based misclassification mea-
sure. In the LGM-MCE framework, increase in the loss smoothness
directly leads to an effect of producing virtual samples, which are
expected to increase the training robustness to unseen samples. Fo-
cusing on this point, we also theoretically clarify the mechanism
of this virtual sample generation. Through experiments, the utility
of the proposed smoothness determination method is demonstrated,
and the mechanism of producing virtual samples and its effect in
robustness increase are also clearly illustrated.
Index Terms— geometric margin, Minimum Classification Er-
ror, loss smoothness, virtual samples, Parzen estimation.
1. INTRODUCTION
In Minimum Classification Error (MCE) training [1], the smooth-
ness of smooth classification error count loss plays a key role in
not only enabling the use of handy gradient-descent-based optimiza-
tion methods but also increasing the training robustness to unseen
samples. However, to increase the training robustness effectively,
an appropriate setting of the smoothness degree is required. To
this problem, a theoretically-grounded solution was proposed for the
MCE training using a conventional functional-margin (FM)-based
misclassification measure [2]. It determines the smoothness using
Parzen-kernel-based error probability estimation in the FM-based
misclassification measure space, and its utility was shown through
systematic evaluation experiments [2].
In parallel to the advent of this smoothness determination
method, a new MCE training method was developed using a
geometric-margin-based misclassification measure [3]. Geomet-
ric margin (GM) is the distance between a classification boundary
and its nearest training sample in a sample space. This new MCE
training, referred to as Large Geometric Margin Minimum Classifi-
cation Error (LGM-MCE) training, was designed to maximize GM
as well as to minimize the smooth classification error count loss. Its
This work was supported in part by Grant-in-Aid for Scientific Research
(B), No. 22300064.
superiority to the previous MCE, i.e., Functional-Margin MCE (FM-
MCE), was also successfully demonstrated through experiments.
It is clearly worth incorporating the smoothness determination
method in the LGM-MCE method. Motivated by this concern,
in this paper, we propose a new training method that applies the
Parzen-estimation-based loss smoothness determination mechanism
to LGM-MCE training. Importantly, increase in the loss smoothness
directly leads to increase in the geometric margin, which can be
considered as an effect of producing virtual samples in a sample
space. The effect of virtual samples is found in the literature (e.g.,
[4] [5]). Therefore, it is also worth investigating this effect, aiming
to clarify the mechanism of robustness increase. In the paper, we
thus analyze this issue and theoretically clarify how loss smoothness
in the one-dimensional GM-based misclassification measure space
produces virtual training samples, which simulate future unseen
samples, in a usually high-dimensional sample space.
Through comparative experiments, the effect of the proposed
smoothness determination method is clearly demonstrated, and the
effect of producing virtual samples using loss smoothness is also ex-
cavated.
2. PARZEN-KERNEL-BASED LOSS SMOOTHNESS
DETERMINATION
2.1. LGM-MCE Formalization Based on Parzen Estimation of
Error Count Risk
First, we newly introduce a formalization of LGM-MCE using the
Parzen estimation of error count risk.
We consider the task of classifying input pattern x ∈X as one
of the J classes (Cj ; j =1, ..., J ), where X denotes the input pattern
sample space. As with the previous MCE framework, LGM-MCE
training adopts the following classification decision rule based on
discriminant functions:
C(x)= C
k
iff k = arg max
j
gj (x; Λ), (1)
where gj (x; Λ) is the discriminant function of Cj that indicates the
degree to which x belongs to Cj . Λ denotes the trainable parameter
set of the classifier and gj (x; Λ) (j =1, ..., J ) is assumed to be
differentiable in x and Λ.
Assume here that x, which belongs to Cy , is a correctly clas-
sified training sample near the classification boundary. The LGM-
MCE training focuses on Euclidean distance r between x and the
boundary, which is the geometric margin (GM). Increasing the r
2081 978-1-4673-0046-9/12/$26.00 ©2012 IEEE ICASSP 2012