Jurnal Ilmiah ILMU KOMPUTER Universitas Udayana Vol. XI, No. 1, April 2018 ISSN 1979 - 5661 9 ONTOLOGY-BASED PARAGRAPH EXTRACTION AND CAUSALITY DETECTION-BASED SIMILARITY FOR ANSWERING WHY-QUESTION A.A.I.N. Eka Karyawati Computer Science/Informatics Program of Mathematics and Natural Sciences Faculty, Udayana University eka.karyawati@cs.unud.ac.id ABSTRACT Paragraph extraction is a main part of an automatic question answering system, especially in answering why-question. It is because the answer of a why-question usually contained in one paragraph instead of one or two sentences. There have been some researches on paragraph extraction approaches, but there are still few studies focusing on involving the domain ontology as a knowledge base. Most of the paragraph extraction studies used keyword- based method with small portion of semantic approaches. Thus, the question answering system faces a typical problem often occuring in keyword-based method that is word mismatches problem. The main contribution of this research is a paragraph scoring method that incorporates the TFIDF-based and causality-detection-based similarity. This research is a part of the ontology-based why-question answering method, where ontology is used as a knowledge base for each steps of the method including indexing, question analyzing, document retrieval, and paragraph extraction/selection. For measuring the method performance, the evaluations were conducted by comparing the proposed method over two baselines methods that did not use causality-detection-based similarity. The proposed method shown improvements over the baseline methods regarding MRR (95%, 0.82-0.42), P@1 (105%, 0.78-0.38), P@5(91%, 0.88- 0.46), Precision (95%, 0.80-0.41), and Recall (66%, 0.88-0.53). Keyword: Ontology-Based Question Answering, Paragraph Retrieval, Why-Question Answering,Why-Question, Causality Detection 1. INTRODUCTION In the typical QA systems based on the document collection, the keyword-based approach are usually used to handle each step of the document retrieval process (Soricut & Brill, 2006; Higashinaka & Isozaki, 2008; Mori et al., 2008; Nakakura & Fu-kumoto, 2008; Verberne et al., 2010; Verberne et al., 2011; Oh et al., 2012; Oh et al., 2013). The keyword-based QA provides limited capabilities to capture the conceptualizations associated with user needs and document contents. Thus, the word mismatch often occurs because the query and the documents cannot represent the information correctly. The word mismatch problem refers to the unsuitable use of words to describe the