ORIGINAL PAPER
ECO and Onto.PT: a flexible approach for creating a
Portuguese wordnet automatically
Hugo Gonçalo Oliveira · Paulo Gomes
Published online: 4 September 2013
© Springer Science+Business Media Dordrecht 2013
Abstract A wordnet is an important tool for developing natural language pro-
cessing applications for a language. However, most wordnets are handcrafted by
experts, which limits their growth. In this article, we propose an automatic approach
to create wordnets by exploiting textual resources, dubbed ECO. After extracting
semantic relation instances, identified by discriminating textual patterns, ECO
discovers synonymy clusters, used as synsets, and attaches the remaining relations
to suitable synsets. Besides introducing each step of ECO, we report on how it was
implemented to create Onto.PT, a public lexical ontology for Portuguese. Onto.PT
is the result of the automatic exploitation of Portuguese dictionaries and thesauri,
and it aims to minimise the main limitations of existing Portuguese lexical
knowledge bases.
Keywords Information extraction · Lexical ontology · Wordnet ·
Clustering · Semantic relations
1 Introduction
A substantial amount of data produced every day is available in natural language
text. Understanding its meaning involves more than recognising words and their
interactions, and typically requires access to external sources of knowledge. This
fact lead to the creation of broad-coverage knowledge bases, which can be exploited
H. Gonçalo Oliveira (&) · P. Gomes
CISUC, Departamento de Engenharia Informa ´tica, Faculdade de Cie ˆncias e Tecnologia,
Universidade de Coimbra, Po ´lo II, Pinhal de Marrocos, 3030-290 Coimbra, Portugal
e-mail: hroliv@dei.uc.pt
P. Gomes
e-mail: pgomes@dei.uc.pt
123
Lang Resources & Evaluation (2014) 48:373-393
DOI 10.1007/s10579-013-9249-9