Improving Supervised Learning by Feature Decomposition Oded Maimon and Lior Rokach Tel-Aviv University Department of Industrial Engineering Ramat Aviv, Tel Aviv 69978, Israel email : maimon@eng.tau.ac.il liorr@eng.tau.ac.il Abstract. This paper presents the Feature Decomposition Approach for improv- ing supervised learning tasks. While in Feature Selection the aim is to identify a representative set of features from which to construct a classification model, in Feature Decomposition, the goal is to decompose the original set of features into several subsets. A classification model is built for each subset, and then all generated models are combined. This paper presents theoretical and practical as- pects of the Feature Decomposition Approach. A greedy procedure, called DOT (Decomposed Oblivious Trees), is developed to decompose the input features set into subsets and to build a classification model for each subset separately. The results achieved in the empirical comparison testing with well-known learn- ing algorithms (like C4.5) indicate the superiority of the feature decomposition approach in learning tasks that contains high number of features and moderate numbers of tuples. 1 Introduction and Motivation Supervised Learning is one of the most important tasks in knowledge discovery in databases (KDD). In supervised problems the induction algorithm is given a set of train- ing instances and the corresponding class labels and outputs a classification model. The classification model takes an unlabeled instance and predicts its class. The classifica- tion techniques can be implemented on variety of domains like marketing, finance and manufacturing. Fayyad et al. (see [13]) claim that the explicit challenges for the KDD research community is to develop methods that facilitate the use of data mining algorithms for real-world databases. One of the characteristics of a real world databases is high vol- ume. The difficulties in implementing classification algorithms as-is on high volume databases derives from the increase in the number of records in the database and from the increase in the number of features or attributes in each record (high dimensionality). High numbers of records primarily create difficulties in storage and computing com- plexity. Approaches for dealing with high number of records include sampling methods; massively parallel processing and efficient storage methods. However high dimension- ality increases the size of the search space in an exponential manner, and thus increases the chance that the algorithm will find spurious models that are not valid in general.