77 Proc. of The Second Intl. Conf. On Advances In Computing, Control And Networking - ACCN 2015 Copyright © Institute of Research Engineers and Doctors, USA .All rights reserved. ISBN: 978-1-63248-073-6 doi: 10.15224/ 978-1-63248-073-6-16 Ontology Model Development Combined with Bayesian Network Ionia Veritawati, Ito Wasito, T. Basaruddin Abstract— Recently, development of methods in extracting knowledge from a text collection is still explored. In this work, the proposed approach utilize important words or key words that represent a domain of text. The key words may have relations among them and the relational keywords in the text domain can be organized become an ontology model as a domain knowledge. The proposed method for forming knowledge represented the text consists of three stages process. First, Vector Space Model (VSM) of key words from text is clustered using bottom-up approach and each clustered data is categorized to be an input of structure learning in a Bayesian network concept. The next stage, structure development of each clustered data using Markov Chain Monte Carlo (MCMC) method such that key words as nodes are related each other as in DAG (directed acyclic graph) form. The result of structure learning process of each cluster produces a clustered DAG. The same learning process is also applied to the original data and it produces a general DAG. The third stage is an analysis process using some rules applied to clustered DAGs and the general DAG to determine connector nodes. A connector node is located in a clustered DAG and it has a relation (edge) to other node in another clustered DAG. It causes cluster of DAGs to be a union graph called an Ontology Model which represent knowledge of the text domain. Data in this works consist of simulation data using a small number of key words from natural science. The ontology model resulted is evaluated manually and it shows that the knowledge of text can be represented visually. The experiment of ontology development still has some challenges to be improved. Keywords— bottom-up clustering, MCMC, Connector Node, Ontology I. Introduction Ontology development has been studied in many application domain areas, such as in automotive industries to develop knowledge of services [1], in bioinformatics to represent protein-protein interaction [2], in medical fields to analyze diagnosis requirements [3], in semantic web to make links between web pages [4], etc. Approach for ontology development are various including formal method approach [5] and machine learning approach [6]. The ways of the approach are developed manually, semiautomatically or automatically. It means a domain expert is sometimes needed to develop relevant ontology as a knowledge of the domain [5]. The proposed method of ontology development is an automatic process without a domain expert. It can be applied to text data which can be derived from any domain. Ionia Veritawati Department of Informatics Pancasila University Indonesia Ito Wasito; T. Basaruddin Faculty of Computer Science University of Indonesia Indonesia I. Ontology and Text Ontology is broadly defined as ―a formal, explicit specification of a shared conceptualization‖ [7]. Generally, domain ontology representation has spectrum covered ranging from lightweight ontology which the structure is represented by a taxonomy (tree or graph) to formal ontology represented by a relational data base [5]. Text as a data collection consists of meaningful words as key words or key phrases, and stop words which are meaningless words and are usually removed. The used of text data in machine learning approach is initated by extracting only frequencies of meaningful words from the data and by arranging the frequencies in a vector space (table of key words versus documents) [8]. Collection of meaningful words from a domain represents knowledge of the domain itself. It can be arranged more specifically by determining relations among the meaningful words. The meaningful words related to each other is called as an ontology [9]. In this work, text data models are created and used to develop an ontology model by using the proposed methodology (Fig. 1). II. Methodology Fig. 1 is a methodology for ontology development proposed. Text as data are numbers of collection of key words from documents in a Vector Space Model (VSM). Three types of DAG are defined including modeled DAG, clustered DAG and general DAG. Modeled DAG is determined manually as a model, clustered DAG and general DAG are resulted from a structure learning. In data modeling step, the data are modeled by creating manually two or more modeled Directed Acyclic Graphs (DAG) including labels as key words for each node and relations (edges) between them. Each modeled DAG will represent a cluster of key words collection and its relations (cluster of knowledge). Further, the modeled DAGs are sampled by a bayesian network approach. The samplings from all modeled DAGs are combined as a table of categorical data. The process is continued by converting the categorical data to real numbers as a vector data. This vector data model is an inputted data. Preprocessing is applied to the data by using tf-idf and normalization. A hierarchical clustering is applied to the vector data which functions to separate their data elements. The clustered data are categorized and they are as an input data for structure learning process in bayesian network. A scoring function is applied to each clustered data to predict a graph structure