Inferring Grammars for Mildly Context Sensitive Languages in Polynomial-Time Tim Oates 1 , Tom Armstrong 1 , Leonor Becerra Bonache 2 , and Mike Atamas 1 1 University of Maryland Baltimore County Baltimore, MD 21250 USA {oates, arm1, m39}@umbc.edu 2 Rovira i Virgili University Pl. Imperial Tarraco 1, 43005, Tarragona, Spain leonor.becerra@estudiants.urv.es Abstract. Natural languages contain regular, context-free, and context- sensitive syntactic constructions, yet none of these classes of formal lan- guages can be identified in the limit from positive examples. Mildly context-sensitive languages are able to represent some context-sensitive constructions, those most common in natural languages, such as mul- tiple agreement, crossed agreement, and duplication. These languages are attractive for natural language applications due to their expressive- ness, and the fact that they are not fully context-sensitive should lead to computational advantages as well. We realize one such computational advantage by presenting the first polynomial-time algorithm for inferring Simple External Context Grammars, a class of mildly context-sensitive grammars, from positive examples. 1 Introduction Despite the fact that every normal child masters his native language, the learn- ing mechanisms that underly this distinctly human feat are poorly understood. The ease with which children learn language belies the underlying complexity of the task. They face a number of theoretical challenges, including apparently in- sufficient data from which to learn lexical semantics (Quine’s “gavagai” problem [1]) or syntax (Chomsky’s “argument from the poverty of the stimulus” [2]). For example, it is known that many classes or formal languages, such as regular and context-free, cannot be learned solely from positive examples, i.e., strings that are in the (regular or context-free) language to be learned [3]. This is problematic because children either do not receive negative examples (i.e., strings that are not in the language to be learned) or pay little attention when such examples are presented [4]. There are a few standard ways of avoiding these theoretical obstacles to learn- ing syntax from positive examples. One is to assume the existence of information in addition to positive examples that comprise the training data. For example, most algorithms for learning context-free grammars from positive examples as- sume that each example is paired with its unlabeled derivation tree, which is the parse tree for the string from which the non-terminal labels on the interior nodes Y. Sakakibara et al. (Eds.): ICGI 2006, LNAI 4201, pp. 137–147, 2006. c Springer-Verlag Berlin Heidelberg 2006