November 13, 2003 16:5 WSPC/INSTRUCTION FILE gini05 International Journal of Pattern Recognition and Artiﬁcial Intelligence c  World Scientiﬁc Publishing Company MULTI-CLASS CLASSIFIER FROM A COMBINATION OF LOCAL EXPERTS: TOWARD DISTRIBUTED COMPUTATION FOR REAL-PROBLEM CLASSIFIERS CHRISTOPH K ¨ ONIG, GIUSEPPINA GINI, MARIAN CRACIUN Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, 20133, Milano, Italy chr koenig@gmx.de, gini@elet.polimi.it, mcraciun@ugal.ro EMILIO BENFENATI Laboratory of Environmental Chemistry and Toxicology, Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea, 62, 20157, Milano, Italy benfenati@marionegri.it Abstract. In many real-world applications simple classiﬁers are too weak to have pre- dictive power. Ensemble techniques, or mixtures of experts, are a possible solution. We illustrate why mixtures of experts are a natural choice in domains such as the prediction of environmental toxicity for chemicals, when a structural approach is pursued. The real data here used derive from peer reviewed experiments, are publicly available, but are diﬃcult to model. We used them to predict aquatic toxicity for ﬁsh. Chemical infor- mation was coded into a set of about 160 descriptors; after reducing the dimensions of the feature vector through diﬀerent techniques, we developed multivariate regression to build a model of the toxic eﬀects of the chemicals. Deﬁning toxicity as a category, as in European Union (EU) regulations, we extended the study to predict toxicity class. Prob- lems with the poor predictive power of this simple approach obliged us to reconsider the problem from a more theoretical angle. We respected locality criterion to build diﬀerent local classiﬁers, one for each chemical class, to achieve better results. Then we combined the classiﬁers to get a complete system to predict any chemical for the chemical classes studied. Keywords : mixture of experts; classiﬁcation from regression; QSAR. 1. Introduction Research in the past decade has shown that classiﬁcation and regression problem ensembles are often much more accurate than the individual base learners that form them 21 . There are diﬀerent ways to use several classiﬁers in a recognition problem, at least two main streams deriving from the introduction of “ensembling” highly correct classiﬁers that disagree as much as possible, and “mixtures of experts” 17, 19 , built on the idea of training individual networks on a subtask, and then combining these predictions with a gating function that depends on the input. In this paper we shall use the term combination of experts, in order to include all the aspects of 1