Improving the performance of a Named Entity Extractor by applying a Stacking Scheme Jos´ e A. Troyano, V´ ıctor J. D´ ıaz, Fernando Enr´ ıquez and Luisa Romero Department of Languages and Computer Systems University of Seville Av. Reina Mercedes s/n 41012, Sevilla (Spain) troyano@lsi.us.es Abstract. In this paper we investigate the way of improving the perfor- mance of a Named Entity Extraction (NEE) system by applying machine learning techniques and corpus transformation. The main resources used in our experiments are the publicly available tagger TnT and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We split the NEE task into two subtasks 1) Named Entity Recognition (NER) that involves the identification of the group of words that make up the name of an entity and 2) Named Entity Classification (NEC) that determines the category of a named entity. We have focused our work on the improvement of the NER task, generating four differ- ent taggers with the same training corpus and combining them using a stacking scheme. We improve the baseline of the NER task (F β=1 value of 81.84) up to a value of 88.37. When a NEC module is added to the NER system the performance of the whole NEE task is also improved. A value of 70.47 is achieved from a baseline of 66.07. 1 Introduction Named Entity Extraction involves the identification of words that make up the name of an entity, and the classification of this name into a set of categories. For example, in the following text, the words “Juan Antonio Samaranch” are the name of a person, the word “COI” is an organization name, “R´ ıo de Janeiro” is the name of a place and, finally, “Juegos Ol´ ımpicos” is an event name: El presidente del COI, Juan Antonio Samaranch, se sum´o hoy a las alabanzas vertidas por otros dirigentes deportivos en ıo de Janeiro sobre la capacidad de esta ciudad para acoger unos Juegos Ol´ ımpicos. In order to implement a system that extracts name entities from plain text we have to meet with two different problems, the recognition of a named entity and its classification. Named Entity Recognition (NER) is the identification of the word sequence that forms the name of an entity, and Named Entity Classification (NEC) is the subtask in charge of deciding which is the category assigned to a previously recognized entity.