401 THE ITALIAN PAROLE CORPUS: AN OVERVIEW RITA MARINELLI, LISA BIAGINI, REMO BINDI, SARA GOGGI, MONICA MONACHINI, PAOLA ORSOLINI, EUGENIO PICCHI, SERGIO ROSSI, NICOLETTA CALZOLARI, ANTONIO ZAMPOLLI Abstract - The PAROLE project (Preparatory Action for Linguistic Resources Organization for Language Engineering) has produced a set of harmonized corpora and lexicons for a large number of European languages. Each corpus, made up of 20 million words, was built up as reference corpus for Human Language Technology applications, to provide full information about a large variety of text types in the language considered, to represent the use of contemporary language and to become the first nucleus of an electronic text library. The texts have been stored using a common format following the standards recommended in the CES (Corpus Encoding Standard), according to flexibility and multifunctionality criteria. The texts belong to a wide range of media and genres, selected in proportions aimed at reflecting their prominence within the society, classified according to medium, genre, topic and time of production. Keywords - textual resources, corpus design, corpus representation, corpus annotation 1. INTRODUCTION PAROLE was one of major projects launched by the EC for the construction of Language Resources (LR) in the field of written language. Over the last fifteen years there has been growing interest on the part of the NLP (Natural Language Processing) community towards the development of large reusable language data. The lack of big computational lexicons and the non- homogeneity of existing resources has been a hindrance to the progress of NLP applications. The LE-PAROLE project is aimed at building large, generic and reusable, uniformly structured textual and lexical databases for the European languages.