Abstract In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus—the New Corpus for Ireland (NCI)—to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may ﬁnd it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We describe its design, collection and encoding. Keywords Corpus linguistics Æ Lexicography Æ Computational linguistics Æ Natural language processing Æ Dictionaries Æ Irish Æ Gaelic Æ Hiberno-English Æ Language technology 1 Introduction In this paper we describe the development of the New Corpus for Ireland (NCI)—a substantial lexicographic corpus in two parts, one being Irish (the Celtic language of Ireland), the other Hiberno-English (the variety of English that is spoken in Ireland). We describe its design, collection, and encoding. A corpus is of optimal use to lexicographers if it is loaded into a corpus query tool which supports them in ﬁnding collocational and grammatical patterns. To that end the corpus must be grammatically analyzed. While suitable tools were available for English, they were not for Irish, so we extended work on an Irish lemmatizer, and developed a part-of-speech tagger and set of grammatical relation deﬁnitions for Irish. A. Kilgarriff (&) Æ M. Rundell Lexicography MasterClass Ltd, Brighton, UK e-mail: adam@lexmasterclass.com E. Uı ´ Dhonnchadha Trinity College, Dublin, Ireland 123 Lang Res Eval DOI 10.1007/s10579-006-9011-7 ORIGINAL PAPER Efﬁcient corpus development for lexicography: building the New Corpus for Ireland Adam Kilgarriff Æ Michael Rundell Æ Elaine Uı´ Dhonnchadha Received: 21 July 2005 / Accepted: 16 October 2006 Ó Springer Science+Business Media B.V. 2006