1136
Multilogistic Regression by Product Units
P. A. Gutiérrez
University of Córdoba, Spain
C. Hervás
University of Córdoba, Spain
F. J. Martínez-Estudillo
INSA – ETEA, Spain
M. Carbonero
INSA – ETEA, Spain
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
INTRODUCTION
Multi-class pattern recognition has a wide range of
applications including handwritten digit recognition
(Chiang, 1998), speech tagging and recognition (Atha-
naselis, Bakamidis, Dologlou, Cowie, Douglas-Cowie
& Cox, 2005), bioinformatics (Mahony, Benos, Smith &
Golden, 2006) and text categorization (Massey, 2003).
This chapter presents a comprehensive and competitive
study in multi-class neural learning which combines
different elements, such as multilogistic regression,
neural networks and evolutionary algorithms.
The Logistic Regression model (LR) has been widely
used in statistics for many years and has recently been
the object of extensive study in the machine learning
community. Although logistic regression is a simple and
useful procedure, it poses problems when is applied
to a real-problem of classifcation, where frequently
we cannot make the stringent assumption of additive
and purely linear effects of the covariates. A technique
to overcome these diffculties is to augment/replace
the input vector with new variables, basis functions,
which are transformations of the input variables, and
then to use linear models in this new space of derived
input features. Methods like sigmoidal feed-forward
neural networks (Bishop, 1995), generalized additive
models (Hastie & Tibshirani, 1990), and PolyMARS
(Kooperberg, Bose & Stone, 1997), which is a hybrid
of Multivariate Adaptive Regression Splines (MARS)
(Friedman, 1991) specifcally designed to handle clas-
sifcation problems, can all be seen as different non-
linear basis function models. The major drawback of
these approaches is stating the typology and the optimal
number of the corresponding basis functions.
Logistic regression models are usually ft by maxi-
mum likelihood, where the Newton-Raphson algorithm
is the traditional way to estimate the maximum likeli-
hood a-posteriori parameters. Typically, the algorithm
converges, since the log-likelihood is concave. It is
important to point out that the computation of the
Newton-Raphson algorithm becomes prohibitive when
the number of variables is large.
Product Unit Neural Networks, PUNN, introduced
by Durbin and Rumelhart (Durbin & Rumelhart,
1989), are an alternative to standard sigmoidal neural
networks and are based on multiplicative nodes instead
of additive ones.
BACKGROUND
In the classifcation problem, measurements x
i
, i =
1,2,...,k, are taken on a single individual (or object), and
the individuals are to be classifed into one of J classes
on the basis of these measurements. It is assumed that J
is fnite, and the measurements x
i
are random observa-
tions from these classes. A training sample D = {(x
n
, y
n
);
n = 1, 2,...,N} is available, where x
n
= (x
1n
,...,x
kn
) is the
vector of measurements taking values in
k
Ω⊂ , and
y
n
is the class level of the nth individual. In this chapter,
we will adopt the common technique of representing the
class levels using a “1-of-J” encoding vector y = (y
(1)
,
y
(2)
,...,y
(J)
), such as y
(l)
= 1 if x corresponds to an example
belonging to class l and y
(l)
= 0 otherwise. Based on the
training sample, we wish to fnd a decision function
C : Ω → {1,2,...,J} for classifying the individuals. In
other words, C provides a partition, say D
1
,D
2
,...,D
J
, of
Ω, where D
l
corresponds to the lth class, l = 1,2,...,J,