INCREASING VIRTUAL SAMPLES THROUGH LOSS SMOOTHNESS DETERMINATION IN LARGE GEOMETRIC MARGIN MINIMUM CLASSIFICATION ERROR TRAINING Tsukasa Ohashi 1 , Hideyuki Watanabe 2 , Jun’ichi Tokuno 1 , Shigeru Katagiri 1 , Miho Ohsaki 1 , Shigeki Matsuda 2 , and Hideki Kashioka 2 1 Graduate School of Engineering, Doshisha University 1-3 Tatara Miyakodani, Kyotanabe-shi, Kyoto 610-0394, Japan. 2 National Institute of Information and Communications Technology 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0289, Japan. E-mail: hideyuki.watanabe@nict.go.jp ABSTRACT We propose a new method for automatically determining the smoothness of smooth classiﬁcation error count loss for the recent Large Geometric Margin Minimum Classiﬁcation Error (LGM- MCE) training. The method uses the Parzen-estimation-based formalization of MCE training, and it realizes the determination through the maximum likelihood estimation of error count risk in the one-dimensional geometric-margin-based misclassiﬁcation mea- sure. In the LGM-MCE framework, increase in the loss smoothness directly leads to an effect of producing virtual samples, which are expected to increase the training robustness to unseen samples. Fo- cusing on this point, we also theoretically clarify the mechanism of this virtual sample generation. Through experiments, the utility of the proposed smoothness determination method is demonstrated, and the mechanism of producing virtual samples and its effect in robustness increase are also clearly illustrated. Index Terms— geometric margin, Minimum Classiﬁcation Er- ror, loss smoothness, virtual samples, Parzen estimation. 1. INTRODUCTION In Minimum Classiﬁcation Error (MCE) training [1], the smooth- ness of smooth classiﬁcation error count loss plays a key role in not only enabling the use of handy gradient-descent-based optimiza- tion methods but also increasing the training robustness to unseen samples. However, to increase the training robustness effectively, an appropriate setting of the smoothness degree is required. To this problem, a theoretically-grounded solution was proposed for the MCE training using a conventional functional-margin (FM)-based misclassiﬁcation measure [2]. It determines the smoothness using Parzen-kernel-based error probability estimation in the FM-based misclassiﬁcation measure space, and its utility was shown through systematic evaluation experiments [2]. In parallel to the advent of this smoothness determination method, a new MCE training method was developed using a geometric-margin-based misclassiﬁcation measure [3]. Geomet- ric margin (GM) is the distance between a classiﬁcation boundary and its nearest training sample in a sample space. This new MCE training, referred to as Large Geometric Margin Minimum Classiﬁ- cation Error (LGM-MCE) training, was designed to maximize GM as well as to minimize the smooth classiﬁcation error count loss. Its This work was supported in part by Grant-in-Aid for Scientiﬁc Research (B), No. 22300064. superiority to the previous MCE, i.e., Functional-Margin MCE (FM- MCE), was also successfully demonstrated through experiments. It is clearly worth incorporating the smoothness determination method in the LGM-MCE method. Motivated by this concern, in this paper, we propose a new training method that applies the Parzen-estimation-based loss smoothness determination mechanism to LGM-MCE training. Importantly, increase in the loss smoothness directly leads to increase in the geometric margin, which can be considered as an effect of producing virtual samples in a sample space. The effect of virtual samples is found in the literature (e.g., [4] [5]). Therefore, it is also worth investigating this effect, aiming to clarify the mechanism of robustness increase. In the paper, we thus analyze this issue and theoretically clarify how loss smoothness in the one-dimensional GM-based misclassiﬁcation measure space produces virtual training samples, which simulate future unseen samples, in a usually high-dimensional sample space. Through comparative experiments, the effect of the proposed smoothness determination method is clearly demonstrated, and the effect of producing virtual samples using loss smoothness is also ex- cavated. 2. PARZEN-KERNEL-BASED LOSS SMOOTHNESS DETERMINATION 2.1. LGM-MCE Formalization Based on Parzen Estimation of Error Count Risk First, we newly introduce a formalization of LGM-MCE using the Parzen estimation of error count risk. We consider the task of classifying input pattern x ∈X as one of the J classes (Cj ; j =1, ..., J ), where X denotes the input pattern sample space. As with the previous MCE framework, LGM-MCE training adopts the following classiﬁcation decision rule based on discriminant functions: C(x)= C k iff k = arg max j gj (x; Λ), (1) where gj (x; Λ) is the discriminant function of Cj that indicates the degree to which x belongs to Cj . Λ denotes the trainable parameter set of the classiﬁer and gj (x; Λ) (j =1, ..., J ) is assumed to be differentiable in x and Λ. Assume here that x, which belongs to Cy , is a correctly clas- siﬁed training sample near the classiﬁcation boundary. The LGM- MCE training focuses on Euclidean distance r between x and the boundary, which is the geometric margin (GM). Increasing the r 2081 978-1-4673-0046-9/12/$26.00 ©2012 IEEE ICASSP 2012