77
Proc. of The Second Intl. Conf. On Advances In Computing, Control And Networking - ACCN 2015
Copyright © Institute of Research Engineers and Doctors, USA .All rights reserved.
ISBN: 978-1-63248-073-6 doi: 10.15224/ 978-1-63248-073-6-16
Ontology Model Development Combined with
Bayesian Network
Ionia Veritawati, Ito Wasito, T. Basaruddin
Abstract— Recently, development of methods in extracting
knowledge from a text collection is still explored. In this work,
the proposed approach utilize important words or key words
that represent a domain of text. The key words may have
relations among them and the relational keywords in the text
domain can be organized become an ontology model as a
domain knowledge. The proposed method for forming
knowledge represented the text consists of three stages process.
First, Vector Space Model (VSM) of key words from text is
clustered using bottom-up approach and each clustered data is
categorized to be an input of structure learning in a Bayesian
network concept. The next stage, structure development of
each clustered data using Markov Chain Monte Carlo
(MCMC) method such that key words as nodes are related
each other as in DAG (directed acyclic graph) form. The result
of structure learning process of each cluster produces a
clustered DAG. The same learning process is also applied to the
original data and it produces a general DAG. The third stage is
an analysis process using some rules applied to clustered DAGs
and the general DAG to determine connector nodes. A
connector node is located in a clustered DAG and it has a
relation (edge) to other node in another clustered DAG. It
causes cluster of DAGs to be a union graph called an Ontology
Model which represent knowledge of the text domain. Data in
this works consist of simulation data using a small number of
key words from natural science. The ontology model resulted is
evaluated manually and it shows that the knowledge of text can
be represented visually. The experiment of ontology
development still has some challenges to be improved.
Keywords— bottom-up clustering, MCMC, Connector Node,
Ontology
I. Introduction
Ontology development has been studied in many
application domain areas, such as in automotive industries to
develop knowledge of services [1], in bioinformatics to
represent protein-protein interaction [2], in medical fields to
analyze diagnosis requirements [3], in semantic web to
make links between web pages [4], etc. Approach for
ontology development are various including formal method
approach [5] and machine learning approach [6]. The ways
of the approach are developed manually, semiautomatically
or automatically. It means a domain expert is sometimes
needed to develop relevant ontology as a knowledge of the
domain [5]. The proposed method of ontology development
is an automatic process without a domain expert. It can be
applied to text data which can be derived from any domain.
Ionia Veritawati
Department of Informatics
Pancasila University
Indonesia
Ito Wasito; T. Basaruddin
Faculty of Computer Science
University of Indonesia
Indonesia
I. Ontology and Text
Ontology is broadly defined as ―a formal, explicit
specification of a shared conceptualization‖ [7]. Generally,
domain ontology representation has spectrum covered
ranging from lightweight ontology which the structure is
represented by a taxonomy (tree or graph) to formal
ontology represented by a relational data base [5].
Text as a data collection consists of meaningful words as
key words or key phrases, and stop words which are
meaningless words and are usually removed. The used of
text data in machine learning approach is initated by
extracting only frequencies of meaningful words from the
data and by arranging the frequencies in a vector space
(table of key words versus documents) [8].
Collection of meaningful words from a domain
represents knowledge of the domain itself. It can be
arranged more specifically by determining relations among
the meaningful words. The meaningful words related to each
other is called as an ontology [9]. In this work, text data
models are created and used to develop an ontology model
by using the proposed methodology (Fig. 1).
II. Methodology
Fig. 1 is a methodology for ontology development
proposed. Text as data are numbers of collection of key
words from documents in a Vector Space Model (VSM).
Three types of DAG are defined including modeled DAG,
clustered DAG and general DAG. Modeled DAG is
determined manually as a model, clustered DAG and
general DAG are resulted from a structure learning. In data
modeling step, the data are modeled by creating manually
two or more modeled Directed Acyclic Graphs (DAG)
including labels as key words for each node and relations
(edges) between them. Each modeled DAG will represent a
cluster of key words collection and its relations (cluster of
knowledge). Further, the modeled DAGs are sampled by a
bayesian network approach. The samplings from all
modeled DAGs are combined as a table of categorical data.
The process is continued by converting the categorical data
to real numbers as a vector data. This vector data model is
an inputted data. Preprocessing is applied to the data by
using tf-idf and normalization.
A hierarchical clustering is applied to the vector data
which functions to separate their data elements. The
clustered data are categorized and they are as an input data
for structure learning process in bayesian network. A
scoring function is applied to each clustered data to predict a
graph structure