(IJARAI) International Journal of Advanced Research in Artificial Intelligence, Vol. 2, No. 6, 2013 15 | Page www.ijarai.thesai.org A Structural Algorithm for Complex Natural Languages Parse Generation Enikuomehin, A. O. Dept. of Computer Science, Lagos State University, Lagos, Nigeria Rahman, M. A. Dept. of Computer Science, Lagos State University, Lagos, Nigeria Ameen A. O. Dept. of Computer Science, University of Ilorin, Ilorin, Nigeria Abstract— In artificial intelligence, the study of how humans understand natural languages is cognitive based and such science is essential in the development of a modern day embedded robotic systems. Such systems should have the capability to process natural languages and generate meaningful output. As against machines, humans have the ability to understand a natural language sentence due to the in-born facility inherent in them and such is used to process it. Robotics requires appropriate PARSE systems to be developed in order to handle language based operations. In this paper, we present a new method of generating parse structures on complex natural language using algorithmic processes. The paper explores the process of generating meaning via parse structure and improves on the existing results using well established parsing scheme. The resulting algorithm was implemented in Java and a natural language interface for parse generation is presented. The result further shows that tokenizing sentences into their respective units affects the parse structure in the first instance and semantic representation in the larger scale. Efforts were made to limit the rules used in the generation of the grammar since natural language rules are almost infinite depending on the language set. Keywords—Natural Language; Syntax; Parsing; Meaning Representation I. INTRODUCTION Natural languages [1,2] are used in our everyday communication. They are commonly referred to as human languages. Humans are able to process natural languages easily because it is their basic language of communication since birth. The human system has the capability to learn and use such languages and improve on it over time. Recently, there has been renewed effort in developing systems that emulate human due to increased service rendering requirements including several efforts in [3]. A major factor to be considered in such system is that, they must have the capability to act like human. The need includes the ability to process human speech,( Speech Recognition, an area that has had great research attention) in a way that it can receive speech signals, converts it into text, processes the text and provides a response to the user. The user is obviously more comfortable using his or her natural language to present such speech. However, natural language is a very complex language due to the high level of ambiguity existing in it. This is one of many factors, others include the availability of large set of words in several unstructured order. Thus, to make a functional system, these issues must be clearly addressed. Processing natural languages involves the concept of interpretation and generalization [4]. In Interpretation, the process involves understanding the natural languages while generalization is a next to interpretation handles the representation of the interpreted language. The process of representation will only be functional if the language of presentation is understood by the system. In understanding such languages, several stages of operations are involved. They include morphological analysis (how words are built from morphemes, a morpheme is the smallest meaningful unit in the grammar of a language), chunking (breaking down sentences into words known as tokens, a token is a symbol regarded as an individual concrete mark, not as a class of identical symbols, it is a popular alternative to full parsing), syntactic analysis (analyzing the sentences to determine if they are syntactically correct) and semantic analysis (looking into the meanings). One can consider the importance related to the representation in morphemes as stated above, using the following example, Consider the word “Unladylike” This word consists of three morphemes and four syllables. The Morpheme breaks into: un- 'not', lady '(well behaved) female adult human', like 'having the characteristics of'. None of these morphemes can be broken up any more without losing all the meaning the word is trying to convene. Lady cannot be broken up into "la" and "dy," even though "la" and "dy" are separate syllables. Note that each syllable has no meaning on its own. Thus, our representational framework can be determined by the morphology existing in any given word. This process can be manually interpreted but as the set of terms to be considered increases, the manual interpretation has greater tendencies to fail. Thus an appropriate scheme is to introduce algorithms that can handle such complex representation of natural language in a way that appropriate parse needed for machine translation of natural language can be generated. Such algorithm will generates syntactic structures for natural language sentences by producing a syntactic analysis of any given sentence correctly whose output is the syntactic structure represented by a syntax tree. The syntax tree shows how words build up to form correct sentences. Children learn language by discovering patterns and templates. We learn how to express plural or singular and how to match those forms in verbs and nouns. We learn how to put together a sentence, a question, or a command. Natural Language Processing assumes that if we can define those patterns and describe them to a computer then we can teach a machine something of how we speak and understand each other. Much of this work is