November 13, 2003 16:5 WSPC/INSTRUCTION FILE gini05 International Journal of Pattern Recognition and Artificial Intelligence c World Scientific Publishing Company MULTI-CLASS CLASSIFIER FROM A COMBINATION OF LOCAL EXPERTS: TOWARD DISTRIBUTED COMPUTATION FOR REAL-PROBLEM CLASSIFIERS CHRISTOPH K ¨ ONIG, GIUSEPPINA GINI, MARIAN CRACIUN Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, 20133, Milano, Italy chr koenig@gmx.de, gini@elet.polimi.it, mcraciun@ugal.ro EMILIO BENFENATI Laboratory of Environmental Chemistry and Toxicology, Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea, 62, 20157, Milano, Italy benfenati@marionegri.it Abstract. In many real-world applications simple classifiers are too weak to have pre- dictive power. Ensemble techniques, or mixtures of experts, are a possible solution. We illustrate why mixtures of experts are a natural choice in domains such as the prediction of environmental toxicity for chemicals, when a structural approach is pursued. The real data here used derive from peer reviewed experiments, are publicly available, but are difficult to model. We used them to predict aquatic toxicity for fish. Chemical infor- mation was coded into a set of about 160 descriptors; after reducing the dimensions of the feature vector through different techniques, we developed multivariate regression to build a model of the toxic effects of the chemicals. Defining toxicity as a category, as in European Union (EU) regulations, we extended the study to predict toxicity class. Prob- lems with the poor predictive power of this simple approach obliged us to reconsider the problem from a more theoretical angle. We respected locality criterion to build different local classifiers, one for each chemical class, to achieve better results. Then we combined the classifiers to get a complete system to predict any chemical for the chemical classes studied. Keywords : mixture of experts; classification from regression; QSAR. 1. Introduction Research in the past decade has shown that classification and regression problem ensembles are often much more accurate than the individual base learners that form them 21 . There are different ways to use several classifiers in a recognition problem, at least two main streams deriving from the introduction of “ensembling” highly correct classifiers that disagree as much as possible, and “mixtures of experts” 17, 19 , built on the idea of training individual networks on a subtask, and then combining these predictions with a gating function that depends on the input. In this paper we shall use the term combination of experts, in order to include all the aspects of 1