Abstract—General requirements for knowledge representation in the form of logic rules, applicable to design and control of industrial processes, are formulated. Characteristic behavior of decision trees (DTs) and rough sets theory (RST) in rules extraction from recorded data is discussed and illustrated with simple examples. The significance of the models’ drawbacks was evaluated, using simulated and industrial data sets. It is concluded that performance of DTs may be considerably poorer in several important aspects, compared to RST, particularly when not only a characterization of a problem is required, but also detailed and precise rules are needed, according to actual, specific problems to be solved. Keywords—Knowledge extraction, decision trees, rough sets theory, industrial processes. I. INTRODUCTION N recent years an increasing interest of data mining (DM) applications in industrial enterprises can be observed. Large amounts of collected data, related to designs, manufacturing processes, materials and equipment, can be potentially used for improvement of the quality and economics of production. A comprehensive and insightful characterization of the problems in manufacturing enterprises as well as the potential benefits from application of DM in this area is presented in [1]. Examples and general characteristics of problems related to the usage of data mining techniques and systems in manufacturing environment can be found in several review papers [2]–[4]. A substantial progress in development of complex DM systems for manufacturing organizations can be also observed [5]–[10]. DM techniques can provide various types of information. Most frequently, methods of automated knowledge extraction from the recorded past data in the form of logic rules of the type: ‘IF (conditions) THEN (decision class)’ are utilized. Also another types of information may be important for industrial applications, such as relative significance of input M. Perzyk is with the Institute of Manufacturing Technologies, Warsaw University of Technology, Warsaw, Poland (corresponding author, phone and fax: +48228499797; e-mail: M.Perzyk@ wip.pw.edu.pl). A. Soroczynski – affiliation as above (e-mail: asoroczy@wip.pw.edu.pl). This work was supported by grant N R07 0015 04 from Ministry of Science and Higher Education, Poland. variables (usually process parameters) [11], prediction of continuous-type output (usually process results) as well as grouping (clustering) of variables. In principle, for extraction of logic rules from data, any classification system or model can be used. Typical learning algorithms include direct rule induction, decision trees (DT), naïve Bayesian classifier and algorithms based on the rough sets theory (RST). Detailed information on these methods can be found in [12] and the literature quoted there. Artificial neural networks have also been successfully utilized for logic rules extraction [13]–[16], often involving fuzzy numbers. This approach facilitates processing continuous-valued variables, handling uncertainties appearing in data and usage of linguistic variables. For manufacturing problems DTs are probably the most frequently used tools for rules extraction from data (e.g. [4], [9], [10], [17]–[19]), whereas the RST-based methods seem to be their newer alternative (e.g. [12], [20]–[22]). Both algorithms are relative simple, especially compared to neural or fuzzy-neural systems, and easy to interpret by users. Both of them treat the data in a natural way however, they are based on completely different principles and algorithms. The practical aspects of application of those tools are also different. The computation times of DT are generally short and the interpretation of rules obtained from DT can be facilitated by the graphical representation of the trees. The RST theory may require long computational times and may lead to much larger number of rules, compared to DT, if one seeks a detailed information from the knowledge system. It should be noticed, that whereas DT are widely spread both in handbooks and in commercially available DM software, the RST can be rather seldom found, except for scientific literature. Making a right choice of the rules extraction algorithm is important, particularly in construction of DM systems. However, there are very little comparative studies available, which could show the advantages and weakness of individual tools [12], [20]. The purpose of the present paper is to show important differences in performances of the two algorithms mentioned above, i.e. DT-based and RST-based, chiefly from the standpoint of industrial manufacturing processes. Comparative Study of Decision Trees and Rough Sets Theory as Knowledge Extraction Tools for Design and Control of Industrial Processes Marcin Perzyk and Artur Soroczynski I World Academy of Science, Engineering and Technology International Journal of Industrial and Manufacturing Engineering Vol:4, No:1, 2010 18 International Scholarly and Scientific Research & Innovation 4(1) 2010 scholar.waset.org/1307-6892/7119 International Science Index, Industrial and Manufacturing Engineering Vol:4, No:1, 2010 waset.org/Publication/7119