Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), pages 263–266, Nancy, France, September 2011. c 2011 Association for Computational Linguistics University of Illinois System in HOO Text Correction Shared Task Alla Rozovskaya Mark Sammons Joshua Gioja Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign Urbana, IL 61801 {rozovska,mssammon,gioja,danr}@illinois.edu Abstract In this paper, we describe the University of Illinois system that participated in Helping Our Own (HOO), a shared task in text correc- tion. We target several common errors, such as articles, prepositions, word choice, and punc- tuation errors, and we describe the approaches taken to address each error type. Our system is based on a combination of classifiers, com- bined with adaptation techniques for article and preposition detection. We ranked first in all three evaluation metrics (Detection, Recog- nition and Correction) among six participat- ing teams. We also present type-based scores on preposition and article error correction and demonstrate that our approach achieves best performance in each task. 1 Introduction The Text Correction task addresses the problem of detecting and correcting mistakes in text. This task is challenging, since many errors are not easy to de- tect, such as context-sensitive spelling mistakes that involve confusing valid words in a language (e.g. “there” and “their”). Recently, text correction has taken an interesting turn by focusing on context- sensitive errors made by English as a Second Lan- guage (ESL) writers. The HOO shared task (Dale and Kilgarriff, 2011) focuses on writing mistakes made by non-native writers of English in the context of Natural Language Processing community. This paper presents our entry in the HOO shared task. We target several common types of errors us- ing a combination of discriminative and probabilis- tic classifiers, together with adaptation techniques for article and preposition detection. Our system ranked first in all three evaluation metrics (Detec- tion, Recognition, and Correction). The description of the evaluation schema and the results of the par- ticipating teams can be found in Dale and Kilgarriff (2011). We also evaluate the performance of two system components (Sec. 2), those that target arti- cle and preposition errors, and compare them to the performance of other teams (Sec. 3). 2 System Components Our system comprises components that address ar- ticle and preposition mistakes, word choice errors, and punctuation errors. Table 1 lists the error types that our system targets and shows sample errors from the pilot data 1 . 2.1 Article and Preposition Classifiers We submitted several versions of article and preposi- tion classifiers that build on elements of the systems described in Rozovskaya and Roth (2010b) and Ro- zovskaya and Roth (2010c). The systems are trained on the ACL Anthology corpus, which contains 10 million articles and 5 million prepositions 2 ; some versions also use ad- ditional data from English Wikipedia and the New York Times section of the Gigaword corpus (Lin- guistic Data Consortium, 2003). Our experiments on the pilot data showed a significant performance gain when training on the ACL Anthology corpus, 1 The shared task data are split into pilot and test. Each part consists of text fragments from 19 documents, with one frag- ment from each document included in pilot and one in test. 2 We consider the top 17 English prepositions. 263