Interactive Use of Inductive Approach for Analyzing and Developing Conceptual Structures Ilze Birzniece Department of System Theory and Design Riga Technical University Riga, Latvia ilze.birzniece@rtu.lv Abstract—Inductive learning algorithms learns classification from training examples and uses induced classifier for dealing with new instances. The use of conceptual data structures for classifier’s input is making this task more complicated and classifier may meet the difficulties in class prediction. To broaden applicability of inductive learning based classifiers a collaborative approach between the system and human expert would be useful. The proposed interactive system in uncertain conditions can ask for human advice and improve its knowledge base with the rule derived from this interaction. Interactive inductive learning based classification system is proposed for helping to compare university study courses semi-automatically. Keywords- inductive learning; human-computer interaction; study course comparison; semi-structured documents I. MOTIVATION Data mining is usually defined as process of discovering patterns in large amount of data [1]. However, sometimes the difficulty of knowledge discovery does not lie in inexhaustible data sources. In some domains patterns and underlying rules should be found within limited example set. Moreover, if these examples are represented as conceptual structures the task is getting even more complicated. There is a wide range of machine learning techniques for classification purposes. Inductive learning methods in form of decision trees and rules are highly estimated due to their interpretability; therefore, they are suitable for classification tasks where produced results are needed for a human. In general inductive learning algorithms are restricted by the assumption that data are represented in a form of data base record [2]. However, in real world information is organized in much more vague or complicated forms like plain text, semi- structured text, graphs, heterogeneous kinds of sources, etc. To perform inductive learning on other type of data than data base records, an appropriate pre-processing is needed. For plain text classification there are text categorization methods that can be used. Semi-structured text includes tags, headings or other remarks that assigns more specific meaning to the section that follows the label. Typical text classification approaches would not take into consideration the additional knowledge. A semi-structured text has more structured information compared to a plain text document, and the relation among semi-structured documents is harder to be fully utilized [3]. Therefore a conceptual structure describing the semi-structured text should be created to enable information processing of full value. Data in conceptual structures can have significantly richer and more complex structure than a table of rows and columns [4]. In the process of structure extraction from the text some information can get lost or mapped inaccurately. This leads to creation of incomplete classifier that does not generalize well the problem domain and probably would not be able to make predictions for all new unseen instances when the classifier is applied. There are several methods to deal with this problem. Usually a default rule [5] is used that predicts the most common class in particular data set. This approach is comprehensible but it does not work well in all domains, e.g. if data set contains many classes and all of them occur equally frequently. However, leaving a decision making to some predefined algorithm is not the only option. Some machine learning systems attempt to eliminate the need for human interaction, while others adopt a collaborative approach between human and machine. In situation when new instance cannot be unequivocally classified, a collaborative approach between machine and human expert would be useful. Domain that completely corresponds to the given type of problems is study course compatibility analysis in curriculum management. Globalization has led to the need for study programme and course comparison in order to analyze their compatibility. Bologna process aims at creating the unified European higher education area. One of the key goals of it is to facilitate the mobility of students. Therefore raises the necessity to find and compare curricula. Taking into consideration the number of different education institutions operating inside the global knowledge provision space this is time consuming task if performed only manually. Presented PhD research touches on three main research points, namely (1) creation of conceptual structures from semi- structured texts, (2) using data from conceptual structures as input for inductive learning based classifier, and (3) design of classification system that deals with unclassified instances. The first one of them is independent and wide study subject which incorporates information extraction from documents and will not be extended within my PhD. However, taking into consideration complex data types of problem domain, i.e., study course descriptions, the results of research in information extraction area will be investigated and applied. 978-1-4577-1938-7/12/$26.00 ©2011 IEEE