1 Automatic Construction of the Knowledge Base of an Onomasiological Dictionary Gerardo Sierra * , Laura Hernández Language Engineering Group, Engineering Institute Universidad Nacional Autónoma de México, Ciudad Universitaria, México ABSTRACT For almost 14 years in the Language Engineering Group we have worked on a wide variety of Natural Language Processing (NLP) problems, being one of the earliest in the creation and opera- tion of onomasiological dictionaries. During that time we have fo- cused on search engine dictionary improvement, but recently our aim has been a development methodology for creating specialized onomasiological dictionaries in a semi-automatic way. To automate the creation of onomasiological dictionaries neces- sarily implies the automatic execution of used processes to populate the dictionaries knowledge base. Due to the nature of these diction- aries, the definitions that must be included in the knowledge base are both normative and colloquial. In this paper we present a proposal for semi-automatically popu- lating the knowledge base of these dictionaries. 1 INTRODUCTION An onomasiological dictionary is a dictionary that works in back to front way from “regular” or semasiological diction- aries. In onomasiological dictionaries users already know the definition of a term, but they do not know or have for- gotten the name for that concept (this last problem is com- monly known as having a word on the tip of the tongue) (Zock et al, 2011). Onomasiological dictionaries have been classified into visual dictionaries, reverse dictionaries, thesaurus and syno- nym dictionaries. These dictionaries were created in order to solve the tip-of-the-tongue problem, but people still have difficulty using them because they require either that the user knows the precise words to describe the term, or its classification (i.e. when using a reverse dictionary to find the word potato, you might have to know that a potato is a tuber, and that tubers are a kind of plant). With visual dic- tionaries there is also the problem that not every concept has a visual image to represent it. For these reasons it has been suggested that free-text searcher ―also known as Natural Language searching― is a viable option for solving this problem (Lancaster, 1972) since they allow the user to de- scribe their idea of the concept in the way they would use to explain it to another human. The creation of onomasiological dictionaries that solve inputs written in natural language improves the user experi- ence, but it creates some major challenges that the develop- * GSierraM@iingen.unam.mx ers must handle (Dutoit et al, 2002 and Bilac et al, 2004). The most demanding task might be the one arisen from the different ways in which a person can express the same con- cept, and also the fact that user definitions might not match the formal definitions found in conventional dictionaries. In short, natural language onomasiological dictionaries need a rich knowledge base which includes not only formal, but also informal definitions. Knowledge bases can be ob- tained from ontologies, like in the projects Genoma KB (Cabré et al, 2004) and ONTODIC (Alcina, 2009). Howev- er, given the main goal of onomasiological dictionaries, for this work we decided to extract their Knowledge Bases from definitions written in texts. These definitions, on the other hand, can be used not only to populate the Knowledge Base, but also to create ontologies (Sierra, 2008). 2 DEBO DEBO is the first onomasiological dictionary developed in the Language Engineering Group and it works with user queries given in natural language. DEBO is a specialized dictionary and it was originally made as a dictionary of Nat- ural Disasters, but today its structure and search engine has been extrapolated to other areas such as Linguistics, Metrol- ogy, Veterinary, and Sexuality. 2.1 The search method The dictionary works with a search engine developed by Sierra (1999) and improved later by Hernández (2011). This engine is comprised by • A number of terms of an area of specialization, which are the ones that the dictionaries can retrieve as a pos- sible answer to the user’s queries. • A knowledge base that includes a variety of both nor- mative and colloquial definitions. • A set of key words extracted from the definitions and associated with the terms. • A stop list that contains a catalog of “empty words”, such as prepositions, articles and conjunctions.