Learning Flexible Concepts from Uncertain Data Mohamed Quafafou IRIN, University of Nantes, 2 rue de la Houssiniere, BP 92208 44322, Nantes Cedex 03, France. Abstract. Learning from examples consists of knowledge induction from training examples using an inductive learning algorithm. In practice, a preprocessing phase is necessary to transform numerical attributes into intervals (discretization), to deal with missing values, noisy data, and so forth. The major part of research efforts has proposed methods for data pre-processing but no special attention has been devoted to model and to handle the uncertainty inherent in real-world data. In this paper we introduce a preprocessing task to model uncertainty which is considered during all the learning process. In this context we formalize a notion of flexible concepts in contrast with "sharp" concepts usually represented by crisp sets. Next, we discuss an inductive learning approach based on rough set theory and we propose a method which allows the user to con- trol both the granularity of knowledge resulting from the partition of the universe and the consistency of rules to be learned. Our proposed method is at the basis of the system Alpha which is run on real-world datasets. 1 Introduction The learning from example problem has been widely studied by the machine learning community. The following schema (Gams et al. 1987) is generally used by a learning algorithm: (1) preprocessing of the training set: to deal with nu- merical attributes discretization, treatment of missing values, etc. (2) induction: to infer knowledge that is information on a higher level than the training set, and (3) test: to use the learned knowledge to deal with new and unseen events, for instance, classification of sick persons according to their symptoms. In this paper we will focus only on points (1) and (2) and we will use 5- cross validation protocol to estimate the accuracy of our system, called Alpha. On the one hand, we use fuzzy sets to deal with continous attributes and we consider that values of both qualitative and quantitative attributes are repre- sented by membership functions. Fuzzy sets are also used to represent flexible concepts, which are contrasted with "sharp" concepts which are usually repre- sented by "crisp" sets. We have introduced a new uncertainty modeling task in the preprocessing phase. A teacher must specify the fuzzy sets used to define the semantics of values of a given attribute. On the other hand, we focus our discussion on learning algorithms based on rough set theory, which generally use approximations of concepts during their induction processes. Three main phases