346 8 th International Conference on DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a n i a, M a y 25 – 27, 2 0 0 6 TOWARDS INTEGRATING DECISION TREE WITH XML TECHNOLOGIES Diana GOREA 1 , Sabin Corneliu BURAGA 2 "Al. I. Cuza" University of Iasi str.G-ral Berthelot nr.16, RO-700483 Iasi 1 dgorea@infoiasi.ro, 2 busaco@infoiasi.ro Abstract. The paper proposes a method for efficiently store collections of multi-purpose decision trees within a native distributed XML database. The predictive information for building the XML decision trees is gathered through Web mining techniques and methodologies. In order to share data from heterogeneous sources, the model employs semantic Web languages to describe and represent data sources. The use of a native XML database system provides robust storage and manipulation capabilities of XML decision trees according to a logical model mapping. The classification of real data can be obtained by issuing queries over XML decision trees, using specific XML-based query processing capabilities. Keywords: decision tree, XML, distributed XML database system, semantics. Introduction Decision and regression trees are the hierarchical approach to decision support making methods and are used successfully in many various areas like medical diagnosis, agent learning, risk assessment, radar signal classification, commercial and banking applications, strategy games, policy assessment, expert systems and speech recognition, to name only a few. Usually, the decision trees are built to support one or more decisions. In the latter case, we may also consider a continuous exploitation of a decision tree by various beneficiaries. As a consequence, in the context of a distributed system, we may consider a frequently revised distributed repository of multipurpose decision trees. In the context of World-Wide Web space, a decisional system can function as a group of Web services in order to be invoked by other Web applications. In this case, an XML-based approach in storing decision trees could be more flexible and useful than classical representations (the XML format of decision trees can be viewed as a serialization mechanism of information or knowledge exchanged by decision-making components, e.g. Web agents or services). More interesting approach is to use the decision trees into semantic Web applications. In this case, a distributed native XML-based decisional system can play an important role, because it can offer semantic Web services for making decisions within complex Web applications, such as multi-agent systems or Grids. The decision rules incorporated by decision trees offer a superior layer of the actual semantic Web layers (metadata, schema, and ontology layers) [11] and can be easily expressed by XML constructs. After providing some details regarding the formal model of decision trees and their use in classification, in section 3 we’ll propose an XML-based format for storing decision trees within a XML database system and the extensions of this format to incorporate various metadata and ontological assertions. Section 4 will present how we build XQuery assertions to make queries over XML decision trees. Decision Tree Classifier Model First, we’ll present some general information regarding the model of a decision tree classifier. Let X be a q-dimensional vector called pattern whose components are called features or attributes. The instances (samples) of X are represented by attribute value pairs. If the features of X are elements in a totally ordered set, X is called ordered or numerical pattern, otherwise it is called a categorical pattern. In the case of ordered pattern, the features may have continuous or discrete values.