Natural Language Engineering 14 (4): 457–469. c 2007 Cambridge University Press doi:10.1017/S1351324907004676 Printed in the United Kingdom 457 Strengths and weaknesses of finite-state technology: a case study in morphological grammar development SHULY WINTNER Department of Computer Science, University of Haifa, 31905 Haifa, Israel e-mail: shuly@cs.haifa.ac.il (Received 12 January 2007; revised 31 May 2007; accepted 9 October 2007; first published online 6 December 2007 ) Abstract Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide- coverage morphological grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development. Using a real-world case study, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development. 1 Introduction Finite-state technology (FST) denotes the use of finite-state devices, including auto- mata and transducers, in natural language processing (NLP). Since the early works that demonstrated the applicability of this technology to linguistic representation (Johnson 1972; Koskenniemi 1983; Kaplan and Kay 1994), FST is considered adequate for describing the phonological and morphological processes of the world’s languages (Roche and Schabes 1997; Beesley and Karttunen 2003). Even nonconcatenative processes such as circumfixation, root-and-pattern morphology, or reduplication, were shown to be in principle implementable in FST (Beesley 1998; Cohen-Sygal and Wintner 2006). The utility of FST for NLP was emphasized by the implementation of several toolboxes that provide extended regular expression languages and compilers that convert expressions to finite-state automata and transducers. These include INTEX