PSO-Tagger: A New Biologically Inspired Approach to the Part-of-Speech Tagging Problem Ana Paula Silva 1 , Arlindo Silva 1 , and Irene Rodrigues 2 1 Escola Superior de Tecnologia do Instituto Politécnico de Castelo Branco {dorian,arlindo}@ipcb.pt 2 Universidade de Évora ipr@uevora.pt Abstract. In this paper we present an approach to the part-of-speech tagging problem based on particle swarm optimization. The part-of- speech tagging is a key input feature for several other natural language processing tasks, like phrase chunking and named entity recognition. A tagger is a system that should receive a text, made of sentences, and, as output, should return the same text, but with each of its words associ- ated with the correct part-of-speech tag. The task is not straightforward, since a large percentage of words have more than one possible part-of- speech tag, and the right choice is determined by the part-of-speech tags of the surrounding words, which can also have more than one possible tag. In this work we investigate the possibility of using a particle swarm optimization algorithm to solve the part-of-speech tagging problem sup- ported by a set of disambiguation rules. The results we obtained on two different corpora are amongst the best ones published for those corpora. Keywords: Part-of-speech Tagging, Disambiguation Rules, Evolution- ary Algorithms, Particle Swarm Optimization, Natural Language Pro- cessing. 1 Introduction The words in most languages can assume different roles in a sentence, depend- ing on how they are used. These roles are normally designated by part-of-speech (POS) tags or word classes, such as nouns, verbs, adjectives and adverbs. The process of classifying words into their POS, and labeling them accordingly, is known as POS tagging, or, simply, tagging. Tagging is a very important task in natural language processing (NLP), because it is a necessary step in a large number of more complex processes like phrase chunking, named entity recogni- tion, parsing, machine translation, information retrieval, speech recognition, etc. In fact, it is the second step in the typical NLP pipeline, following tokenization. The role of a word in a sentence is determined by its surrounding words (context). For instance, the word fish can assume the function of a verb, "Men like to fish.", or a noun, "I like smoked fish", depending on how we choose to use it on a sentence. This means that in order to assign to each word of a M. Tomassini et al. (Eds.): ICANNGA 2013, LNCS 7824, pp. 90–99, 2013. c Springer-Verlag Berlin Heidelberg 2013