International Journal of Computer Applications (0975 8887) Volume 62No.3, January 2013 22 A Novel Approach and Comparative Study of Association Rule Algorithms in Validation of Semantics of Sentences Yamuna Devi. N Assistant Professor(Senior Grade) Department of MCA Coimbatore Institute of Technology Coimbatore, India Devi Shree J, PhD. Assistant Professor(Senior Grade) Department of EEE Coimbatore Institute of Technology Coimbatore, India ABSTRACT Efficient Human Computer Interaction (HCI) is an absolute necessary for many applications these days. Computational Linguistics supports HCI to make computers to understand human languages. Advanced Computational models can be built using many technologies to provide easy communication between human and computers. Data mining has emerged to address problems of understanding ever-growing volumes of information for structured data. Data mining is a process to extract hidden knowledge from huge amount of data which can be used to build computational model. The usage of Association Rules (AR), one of the data mining techniques, to build an effective communication between human and computers is elucidated in this paper. The comparative performance of two different Association rule algorithms is illuminated in building a model to legalize semantics of sentences in linguistics domain. The sequence of operations to build the model is explored with necessary constraints at each stage. The model is to verify the meaning of English sentences which are syntactically correct using Apriori and Frequent-pattern tree growth algorithm in a limited domain. As a prerequisite, syntax verification of the sentence is also done and as a follow up, it also provides an interface which can be used for interaction between human and computer. The association rules, a data mining concept is employed in semantic analysis in a distinct way. Since the natural language understanding is an endless process, this work opens the door for the usage of association rules in semantic analysis of natural language sentences in a defined domain. General Terms Association Rules, Human Computer Interaction Keywords Syntax Analysis, Semantic analysis, Apriori algorithm, Question Answering System. 1. INTRODUCTION Data mining is one of fervent field in which research is handled for various application domains. A large amount of data can be the input for data mining task to extract knowledge from it. Linguistics is a domain with vast data which is the study of natural languages that people use for communication. Computational linguistics is related to linguistics and computer science in building computational models of linguistic theories. Building computational models for linguistic analysis is a useful and necessary mission for human-machine communication. It can be achieved to a greater extent by analyzing the syntax and semantic of the sentences pertaining to the natural language. As a new approach, data mining techniques are applied in natural language analysis to find meaning of a sentence as knowledge. There are various disciplines in natural languages, like phonetics, syntax, semantic, pragmatic, morphology, utterance etc [1] [2]. Among these disciplines syntax and semantic analysis are used in a range of applications like machine learning, word sense disambiguation, voice recognition systems and information retrieval etc. The natural languages are also analyzed in computational aspects via Natural Language Processing (NLP), Natural Language Understanding (NLU), etc. 2. PROBLEM DESCRIPTION The syntax analysis in NLP defines the process of analyzing the structure of a sentence. It demonstrates that how the words are related to each other in a sentence [3]. The semantic analysis in NLU defines the process of capturing and understanding the meaning of a sentence in a context. It needs focus today as it helps the people to interact with computers through natural languages. Example, the information about the trains, train times, etc can be obtained by posting a natural language query through an interface [4]. Since, data mining techniques capable of handling huge amount of data, Apriori and FP tree growth algorithms of Association Rules, are applied in verifying the meaning of an English sentence. The performance of both algorithms is compared. One of the popular applications of the semantic analysis is Question Answering System (QAS) [5]. Generally the queries are posted in predefined formats or through menus. It will be easier and useful if the queries are entered as natural language sentences. A natural language sentence which is meaningful can be converted to a formatted SQL query which can be executed to retrieve the information from a database. Thus by providing natural language interface the human-machine interaction is improved with non-computer people. Though data mining has touched greater heights of application domains, it is an endless process to search newer heights in different domains. In this paper, the focus is given to apply Apriori and FP-tree growth algorithms in verifying the meaning of a sentence. As the meaning can be verified for syntactically valid sentences, the syntax analysis is also carried out. By applying the algorithms, the association rules as valid combinations of constituents are generated for verification of meaning and stored in semantic database for future use. The semantically valid sentences are considered for formal query generation which is executed to produce results for end users [5]. An interface is used to post a query in natural language.