A new notion of weakness in classification theory Igor T. Podolak, Adam Roman Institute of Computer Science, Faculty of Mathematics and Computer Science, Jagiellonian University Lojasiewicza 6, Krak´ow, Poland uipodola@theta.uoks.uj.edu.pl, roman@ii.uj.edu.pl Summary. The notion of a weak classifier, as one which is “a little better” than a random one, was introduced first for 2-class problems [1]. The extensions to K-class problems are known. All are based on relative activations for correct and incorrect classes and do not take into account the final choice of the answer. A new under- standing and definition is proposed here. It takes into account only the final choice of classification that must be taken. It is shown that for a K class classifier to be called “weak”, it needs to achieve lower than 1/K risk value. This approach considers only the probability of the final answer choice, not the actual activations. 1 Introduction The classification for a given problem may be solved using various machine learning approaches. One of the most effective, is when several simple classi- fiers are trained, and their responses combined [2, 3, 4, 5]. It is possible that the training of subsequent classifiers depends on how well the already trained classifiers perform. This is the background of the boosting approach introduced by Schapire and further developed in [6, 7, 8]. The boosting algorithms train a sequel h 1 ,h 2 ,... of simple classifiers, each performing slightly better than a random one. After training h i , a boosting algorithm, e.g. AdaBoost, changes the probability distribution function with which examples are chosen for further training putting more attention to in- correctly recognized examples. The final classifier is a weighted sum of all individual classifiers h i . The algorithm is well defined for a 2-class problem with an extension to K-class (K> 2) through a so called pseudo-loss func- tion [7]. Using the pseudo-loss approach, a classifier is defined to be weak if the expected activation for the true class is higher then the mean activation for incorrect classes. The authors have worked on a so-called Hierarchical Classifier (HC) [9]. The HC is built of several simple classifiers which divide the input space