The Pacific Journal of Science and Technology –294– http://www.akamaiuniversity.us/PJST.htm Volume 11. Number 1. May 2010 (Spring) Grammar Induction Strategy Using Genetic Algorithm: Case Study of Fifteen Toy Languages. Nitin S. Choubey, Ph.D. *1 and Madan U. Kharat, Ph.D. 2 1 Student. P.G. Department of Computer Science, S.G.B.A. University, Amravati, Maharastra, India. 2 Professor, Department of Computer Engineering, Institute of Engineering, Bhujbal Knowledge City, Nashik, Maharastra, India * E-mail: nschoubey@gmail.com ABSTRACT Grammar Induction (or Grammar Inference or Language Learning) is the process of learning of a grammar from training data of the positive and negative strings of the language. The paper discusses an extended approach of using stochastic mutation approach based on Adaptive Genetic Algorithm for the induction of the grammar for a set of fifteen different languages. In this approach, proportionate amount of the population is generated by crossover and mutation operators separately. The elite members from the resultant population and the original population are considered for inclusion in the next population. (Keywords: evolutionary computation, genetic algorithm, automata, context free grammar, grammar induction) INTRODUCTION Genetic Algorithms (GAs) were invented by John Holland in the 1960s. Wyard [1] explored the impact of different grammar representations and experimental results show that an evolutionary algorithm using standard context-free grammars (BNF) outperformed other representations. In the conventional grammatical induction, a language acceptor is constructed to accept all the positive examples. Learning from positive examples is called text learning. A more powerful technique uses negative samples as well. This is learning with an informant. In informant learning, the language acceptor is constructed so as to accept all the positive examples and reject all the negative examples. The field of evolutionary computing has been applying problem-solving techniques that are similar in intent to the Machine Learning recombination methods. Most evolutionary computing approaches hold in common that they try and find a solution to a particular problem, by recombining and mutating individuals in a society of possible solutions [2]. In formal language theory, a context-free grammar (CFG) is a grammar, in which every production rule is of the form [3], w V (1) where, V = single non-terminal symbol w = string of terminals and/or non- terminals (possibly empty) The term "context-free" expresses the fact that non-terminals can be rewritten without regard to the context in which they occur. A formal language is context-free if some context-free grammar generates it. These languages are all languages that can be recognized by a non- deterministic pushdown automata. This paper discusses a brief overview of the Genetic Algorithm, a strategy adopted for CFG Induction with Genetic Algorithm, the details of the Languages used in the implementation undertaken by the authors for CFG induction with Genetic Algorithm, and a discussion on the results obtained, respectively. GENETIC ALGORITHM A simple GA works by creating a random initial population of fixed length chromosomes. Each iteration (generation), the population evolves by means of the use of selection, crossover and mutation, which are the main genetic operators in GAs. Individuals are chosen based on their fitness measure to act as parents of offspring which will constitute the new generation. This process is repeated until the termination criterion is satisfied.