Dr. Arti Arya et. al. / (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 06, 2010, 2133-2140 A Knowledge Based Approach for Recognizing Textual Entailment for Natural Language Inference using Data Mining Dr. Arti Arya 1 , Vishwanath Yaligar 2 , Ramya D. Prabhu 2 , Ramya Reddy 2 , Rohith Acharaya 2 1 Department of MCA, PESSE, VTU, Bangalore, India 2 Dept. of Computer Science, PESSE, VTU, Bangalore, India Abstract— Recognizing Textual Entailment (RTE) is a relatively new problem necessary for Natural Language Understanding (NLU) and Automated Knowledge Discovery. Natural Language Inference has dealt with approaches like the bag of words approach, formal methods (First order Logic) and pattern relation extraction which usually do not show satisfactory results. In this paper, Knowledge based approach has been proposed, utilizing data mining concepts on large text which is appropriately classified. Different Lexical resources like WordNet, VerbNet, ConceptNet have been integrated into a rich knowledge base, to provide semantics and structural information on English words. The data mined is used by an Inference system to give the output to the problem. The complete concept presented in the paper has been implemented in the form of a movie search engine wherein the knowledge based RTE concept has been employed on “summaries or plots of the movies” internally to get best possible classification of the movies. The experiments have shown encouraging results, reduced the time of search and provided more accurate results. To the best of our knowledge, it is the first time that RTE concept has been implemented to Information Search in the form of Movie Search Engine. Keywords- Recognizing Text Entailment (RTE), natural language processing,data ( text) mining, knowledge base. I. INTRODUCTION Textual Inference plays an important role in many Natural Language Processing (NLP) [16] tasks. In recent years, Recognizing Textual Entailment (RTE) [3], which focuses on detecting semantic inference, has attracted a lot of attention. The main idea behind RTE is concerned with inferring the meaning of the text from that of another larger text. This has now become a focus area for the natural language processing community and is known as Recognizing Textual Entailment (RTE). The concept of RTE explores the relationship between the hypothesis and text. Given a hypothesis for a sentence in natural language (English), the system must be able to identify if the text entails the hypothesis, i.e., if the hypothesis can be inferred from the text or not or is the data insufficient to conclude. The relationship is denoted by T  H. For instance, given H = “The actor of movie “Notorious” is Craig Grant.” and T = “The movie Notorious was a superhit having Craig Grant in the main lead.” the relation T  H holds true. Successful automation of natural language applications helps in an accurate understanding of the underlying meaning (semantics) of texts with different syntaxes by machines [5]. This becomes a challenging task when different sentences with different words or phrases express the same meaning. The exponential increase in web data creates new challenges for Information Retrieval [15]. Automated search engines that are based on the concept of keyword matching usually return poor quality results in case of same meaning inferred from different linguistic expressions. In this paper, different approach is explored for improving an inference rule collection and its application to the task of recognizing textual entailment and thus implementing the concept to a movie search engine. Two main concerned issues are:  To improve the Natural Language Inference ability of the computer.  To make use of it. Recognizing Textual Entailment (RTE) also known as Natural Language Inference is a relatively new problem necessary for Natural Language Understanding (NLU) and Automated Knowledge Discovery. NLI has a spectrum of approaches like the bag of words approach, formal methods (First order Logic) and pattern relation extraction which do not show satisfactory results. In this paper, the Knowledge based approach is proposed, utilizing data mining tools on large text appropriately classified. Given a hypothesis for a sentence in natural language (English), the system must be able to recognize if the text entails the hypothesis, i.e., if the hypothesis can be inferred from the text or not or is the data insufficient to conclude. The main objective of making a three-way decision of "YES", "NO" and "UNKNOWN" is to drive systems for making accurate informational distinctions. If the outcome of a hypothesis is “unknown” because of insufficient text then that text must be segregated from the hypothesis having outcomes showing “no” . ISSN : 0975-3397 2133