1136 Multilogistic Regression by Product Units P. A. Gutiérrez University of Córdoba, Spain C. Hervás University of Córdoba, Spain F. J. Martínez-Estudillo INSA – ETEA, Spain M. Carbonero INSA – ETEA, Spain Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited. INTRODUCTION Multi-class pattern recognition has a wide range of applications including handwritten digit recognition (Chiang, 1998), speech tagging and recognition (Atha- naselis, Bakamidis, Dologlou, Cowie, Douglas-Cowie & Cox, 2005), bioinformatics (Mahony, Benos, Smith & Golden, 2006) and text categorization (Massey, 2003). This chapter presents a comprehensive and competitive study in multi-class neural learning which combines different elements, such as multilogistic regression, neural networks and evolutionary algorithms. The Logistic Regression model (LR) has been widely used in statistics for many years and has recently been the object of extensive study in the machine learning community. Although logistic regression is a simple and useful procedure, it poses problems when is applied to a real-problem of classifcation, where frequently we cannot make the stringent assumption of additive and purely linear effects of the covariates. A technique to overcome these diffculties is to augment/replace the input vector with new variables, basis functions, which are transformations of the input variables, and then to use linear models in this new space of derived input features. Methods like sigmoidal feed-forward neural networks (Bishop, 1995), generalized additive models (Hastie & Tibshirani, 1990), and PolyMARS (Kooperberg, Bose & Stone, 1997), which is a hybrid of Multivariate Adaptive Regression Splines (MARS) (Friedman, 1991) specifcally designed to handle clas- sifcation problems, can all be seen as different non- linear basis function models. The major drawback of these approaches is stating the typology and the optimal number of the corresponding basis functions. Logistic regression models are usually ft by maxi- mum likelihood, where the Newton-Raphson algorithm is the traditional way to estimate the maximum likeli- hood a-posteriori parameters. Typically, the algorithm converges, since the log-likelihood is concave. It is important to point out that the computation of the Newton-Raphson algorithm becomes prohibitive when the number of variables is large. Product Unit Neural Networks, PUNN, introduced by Durbin and Rumelhart (Durbin & Rumelhart, 1989), are an alternative to standard sigmoidal neural networks and are based on multiplicative nodes instead of additive ones. BACKGROUND In the classifcation problem, measurements x i , i = 1,2,...,k, are taken on a single individual (or object), and the individuals are to be classifed into one of J classes on the basis of these measurements. It is assumed that J is fnite, and the measurements x i are random observa- tions from these classes. A training sample D = {(x n , y n ); n = 1, 2,...,N} is available, where x n = (x 1n ,...,x kn ) is the vector of measurements taking values in k Ω⊂  , and y n is the class level of the nth individual. In this chapter, we will adopt the common technique of representing the class levels using a “1-of-J” encoding vector y = (y (1) , y (2) ,...,y (J) ), such as y (l) = 1 if x corresponds to an example belonging to class l and y (l) = 0 otherwise. Based on the training sample, we wish to fnd a decision function C : Ω → {1,2,...,J} for classifying the individuals. In other words, C provides a partition, say D 1 ,D 2 ,...,D J , of Ω, where D l corresponds to the lth class, l = 1,2,...,J,