Interactive Use of Inductive Approach for Analyzing
and Developing Conceptual Structures
Ilze Birzniece
Department of System Theory and Design
Riga Technical University
Riga, Latvia
ilze.birzniece@rtu.lv
Abstract—Inductive learning algorithms learns classification
from training examples and uses induced classifier for dealing
with new instances. The use of conceptual data structures for
classifier’s input is making this task more complicated and
classifier may meet the difficulties in class prediction. To broaden
applicability of inductive learning based classifiers a
collaborative approach between the system and human expert
would be useful. The proposed interactive system in uncertain
conditions can ask for human advice and improve its knowledge
base with the rule derived from this interaction. Interactive
inductive learning based classification system is proposed for
helping to compare university study courses semi-automatically.
Keywords- inductive learning; human-computer interaction; study
course comparison; semi-structured documents
I. MOTIVATION
Data mining is usually defined as process of discovering
patterns in large amount of data [1]. However, sometimes the
difficulty of knowledge discovery does not lie in inexhaustible
data sources. In some domains patterns and underlying rules
should be found within limited example set. Moreover, if these
examples are represented as conceptual structures the task is
getting even more complicated.
There is a wide range of machine learning techniques for
classification purposes. Inductive learning methods in form of
decision trees and rules are highly estimated due to their
interpretability; therefore, they are suitable for classification
tasks where produced results are needed for a human.
In general inductive learning algorithms are restricted by
the assumption that data are represented in a form of data base
record [2]. However, in real world information is organized in
much more vague or complicated forms like plain text, semi-
structured text, graphs, heterogeneous kinds of sources, etc. To
perform inductive learning on other type of data than data base
records, an appropriate pre-processing is needed.
For plain text classification there are text categorization
methods that can be used. Semi-structured text includes tags,
headings or other remarks that assigns more specific meaning
to the section that follows the label. Typical text classification
approaches would not take into consideration the additional
knowledge. A semi-structured text has more structured
information compared to a plain text document, and the
relation among semi-structured documents is harder to be fully
utilized [3]. Therefore a conceptual structure describing the
semi-structured text should be created to enable information
processing of full value. Data in conceptual structures can have
significantly richer and more complex structure than a table of
rows and columns [4].
In the process of structure extraction from the text some
information can get lost or mapped inaccurately. This leads to
creation of incomplete classifier that does not generalize well
the problem domain and probably would not be able to make
predictions for all new unseen instances when the classifier is
applied. There are several methods to deal with this problem.
Usually a default rule [5] is used that predicts the most
common class in particular data set. This approach is
comprehensible but it does not work well in all domains, e.g. if
data set contains many classes and all of them occur equally
frequently. However, leaving a decision making to some
predefined algorithm is not the only option. Some machine
learning systems attempt to eliminate the need for human
interaction, while others adopt a collaborative approach
between human and machine. In situation when new instance
cannot be unequivocally classified, a collaborative approach
between machine and human expert would be useful.
Domain that completely corresponds to the given type of
problems is study course compatibility analysis in curriculum
management. Globalization has led to the need for study
programme and course comparison in order to analyze their
compatibility. Bologna process aims at creating the unified
European higher education area. One of the key goals of it is to
facilitate the mobility of students. Therefore raises the
necessity to find and compare curricula. Taking into
consideration the number of different education institutions
operating inside the global knowledge provision space this is
time consuming task if performed only manually.
Presented PhD research touches on three main research
points, namely (1) creation of conceptual structures from semi-
structured texts, (2) using data from conceptual structures as
input for inductive learning based classifier, and (3) design of
classification system that deals with unclassified instances. The
first one of them is independent and wide study subject which
incorporates information extraction from documents and will
not be extended within my PhD. However, taking into
consideration complex data types of problem domain, i.e.,
study course descriptions, the results of research in information
extraction area will be investigated and applied.
978-1-4577-1938-7/12/$26.00 ©2011 IEEE