Dr. S. Siva Sathya 2 Dept. Of Com puter Science Associate Professor Puducherry, India A Methodology for Incorporation of Domain Ontology in Knowledge Discovery Process for Interpretation and Improvement of Mining Results Abstract— We live in a world where vast amount of data are collected in every hour of the day. Thus, this era is usually called as data or information age. The analysis of this vast collection of data is of high necessity and utmost important for various decision making process. This necessity has led to the birth of data mining. Data Mining is the process of extracting potentially useful knowledge from raw data. However, the application of machine-understandable knowledge in data mining is not very prevalent and has been recognized as a gap in current traditional data mining practice. Ontology based data mining can play an important role in solving this issue of knowledge discovery. Millions of Indian schools are facing shortage of developmental materials and resources due to the fact that the Government is unaware of the school conditions. This paper aims to incorporate domain knowledge in the data mining process which can help the school authority to identify the urgent necessity of the various schools and make certain policies accordingly. Here, we considered the School Dataset of Assam as an input for the analysis of our methodology. Keywords— Data Mining, Domain Ontology, Simple K- Means, Cluster Analysis, Taxonomy Similarity, Classification I. INTRODUCTION Ontology is a branch of artificial intelligence that represents the formal concepts of a particular domain and relationships amongst those concepts. It is a formal naming and definition of the types, properties, and interrelationship of the entities that fundamentally exist for a particular domain of discourse [20]. Its basic components are concepts, relationships, instances and the rules and axioms that constrain their interpretations [1]. Concept is a class of entities in a domain, like organ is a concept in the medicine domain. Relationships represent the interactions between the concepts or their properties (For example, disease p_ affects organ, here disease and organ are concepts and p_affects represents a relationship). There are two types of Relationships: taxonomy relationship and associative relationship. Taxonomy is a type of relationship that organize concepts into a hierarchical concept tree (For example, sickle_cell is a type of disease_linked_to_genes) whereas associative is a type of relationship that relate the concepts across the tree structure (Example, p_affects). Instances are instantiations of concepts which make up domain knowledge along with the taxonomies and relationships. For example, an instance can be represented as (sickle_cell p_affects spleen). Axioms are used to constrain the values for classes or instances (For example, a disease can affect number of organs and that particular organ can be affected by number of diseases). According to [11], there are two kinds of knowledge that are involved in a knowledge discovery process: data mining knowledge and domain knowledge. Data mining knowledge includes the knowledge about data mining algorithms, how they can be used, parameters tuning, and formats of input data and so on. Domain knowledge refers to the details of the dataset, how the attributes are related, their limits, etc, known as causal relations and so on [11]. The use of ontology has become increasingly popular, because of its ability to allow inferences and provide domains which are understandable by both human and application systems [11]. The following sections II, III, IV include the motivation, literature survey and proposed methodology respectively and finally the conclusion of the paper and future work is in section V. II. MOTIVATION Data Mining has become inevitable for variety of areas like Business, Government, Education, Learning Systems, Retrieval Systems, and Scientific Research etc. Well, different people and organizations have used various techniques to meet the purpose. However, those techniques do not consider the background knowledge of the data and therefore much of the knowledge remains undiscovered. The use of domain ontology in the mining process is the key to overcome this limitation of knowledge discovery. Ontology is referred to as the specification of a concept i.e. the description of the concepts and the relationships which exists for a domain or a community of domains. This methodology helps in interpretation of results of the traditional data mining algorithms. The work here focuses on the incorporation of Joachim Narzary 1 Dept. Of Computer Science M.Tech, CSE Puducherry, India Minisrang Basumatary 3 Dept. Of Computer Science M.Tech, CSE Puducherry, India Joachim Narzary et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (2) , 2016, 749-754 www.ijcsit.com 749