Enriching Thesauri with Hierarchical Relationships by Pattern Matching in Dictionaries ⋆ Lourdes Araujo and Jos´ e R. P´ erez-Ag¨ uera lurdes@sip.ucm.es, jose.aguera@fdi.ucm.es Departamento de Sistemas Inform´ aticos y Programaci´ on. Universidad Complutense de Madrid. Madrid 28040. Spain. Abstract. This paper proposes a pattern matching method applied to dictionaries to identify hierarchical relationships between terms. In this work we focus on this type of relationship because we use it in the au- tomatic generation of thesauri, which are used to improve information retrieval tasks. However the method can also be applied to identify other semantic relationships. We distinguish two kinds of patterns: structural patterns, composed of a sequence of part-of-speech tags, and key pat- terns, typical of dictionary entries, composed of some key terms, along with some part-of-speech tags. This kind of patterns are automatically extracted for the dictionary entries by means of stochastic techniques. The thesaurus, that has been partially constructed previously, is then ex- tended with the new relationships obtained by applying the patterns to a dictionary. We have based the system evaluation on the results obtained with and without the thesaurus in an information retrieval task proposed by the Cross-Language Evaluation Forum (CLEF). The results of these experiments have revealed a clear improvement on the performance. keywords: automatic thesaurus extraction, information retrieval, query ex- pansion, pattern matching, dictionary 1 Introduction Information retrieval (IR) techniques aim at providing fast and effective access to a large amount of information. During the last decades IR has extended its application area from textual documents in static collections to Internet and the Web. Nowadays, IR methods include document indexing, document classification and categorization, etc., most of which try to improve the response to a search query in internet, probably the task most commonly performed everywhere and everytime. The performance of an IR system is usually proportional to the size of the query [14]. Long queries typically provide enough information for the system ⋆ Supported by Ingenier´ ıa del Software e Inteligencia Artificial group, ref. 910494 and project TIC2003-09481-C04