adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Portuguese Controlled Language: Coping with
Ambiguity
Palmira Marrafa, Raquel Amaro, Nuno Freire, Sara Mendes
CLG – Group for the Computation of Lexical and Grammatical Knowledge,
Centro de Linguística da Universidade de Lisboa,
Av. Prof. Gama Pinto, nº 2, 1649-003 Lisbon, Portugal
palmira.marrafa@netcabo.pt, ramaro@clul.ul.pt,
nfreire@gmail.com, sara.mendes@clul.ul.pt
Abstract. This paper focuses on strategies to avoid lexical related ambiguity,
induced by polysemy or by syntactic function effects, in the context of a system
to control Portuguese as a source language for machine translation. This system,
which is being developed under wider scope ongoing research, involves two
main components - a controlled language for Portuguese and a tool to evaluate
the conformity of texts with the controlled language. In a subsidiary way, it also
makes use of the Portuguese WordNet (WordNet.PT).
Keywords: CNL, machine translation, ambiguity
1 Introduction
The so-called controlled natural languages (CNL) involve sets of restrictions on the
lexicon, syntax and/or semantics which enable the reduction or elimination of ambi-
guity and complexity typical of natural language utterances. In this paper we present
strategies to eliminate ambiguity (mainly lexical ambiguity) defined under the scope
of ongoing large coverage work aiming at obtaining better quality results with ma-
chine translation (MT) systems. Instead of listing lexical units to be avoided or to be
used, our approach makes use of WordNet.PT ([1]). Although profiting from previous
work ([2]), the new system involves a larger scope and a shift of perspective.
CNLs are not defined in a univocal way in the literature. Although elaborating on
this matter is out of the aims of this paper, it is worthwhile to clarify that we use CNL
in the following sense: sets of linguistic restrictions to be applied to written texts in a
given language. The nature of those restrictions depends on the purposes to be
achieved.
As discussed in [3] there are two main approaches to the design of CNLs: “natural-
ist” approaches, which view controlled languages as sets of restrictions on the existing
structures and lexicon of a given natural language, stating which structures and lexical
items are not to be used; and “formalist” approaches, which view controlled lan-
guages as sets of vocabulary and rules to form utterances in a given natural language,