A. Ranta, B. Nordström (Eds.): GoTAL 2008, LNAI 5221, pp. 65–76, 2008.
© Springer-Verlag Berlin Heidelberg 2008
A Compact Arabic Lexical Semantics Language Resource
Based on the Theory of Semantic Fields
Mohamed Attia, Mohsen Rashwan, Ahmed Ragheb, Mohamed Al-Badrashiny,
Husein Al-Basoumy, and Sherif Abdou
The Engineering Company for the Development of Computer Systems; RDI,
171
st
Al-Haram Av., 12111, Giza, Egypt
{m_Atteya,Mohsen_Rashwan,Ragheb,Mohammed.Badrashiny,
Basoumy,sAbdou}@RDI-eg.com
Abstract. Applications of statistical Arabic NLP in general, and text mining in
specific, along with the tools underneath perform much better as the statistical
processing operates on deeper language factorizations than on raw text. Lexical
semantic factorization is very important in this regard due to its feasibility, high
level of abstraction, and the language independence of its output.
In the core of such a factorization lies an Arabic lexical semantic DB. While
building this LR, we had to go beyond the conventional exclusive collection of
words from dictionaries and thesauri that cannot alone produce a satisfactory
coverage of this highly inflective and derivative language.
This paper is hence devoted to the design and implementation of an Arabic
lexical semantics LR that enables the retrieval of the possible senses of any
given Arabic word at a high coverage.
Instead of tying full Arabic words to their possible senses, our LR flexibly
relates morphologically and PoS-tags constrained Arabic lexical compounds to
a predefined limited set of semantic fields across which the standard semantic
relations are defined. With the aid of the same large-scale Arabic morphological
analyzer and PoS tagger in the runtime, the possible senses of virtually any
given Arabic word are retrievable.
Keywords: Arabic, AWN, coverage, language factorization, language resource,
lexical compounds, lexical semantics, LR, morphology, morpho-PoS constrain-
ing, PoS tagging, semantic fields, semantic mapping, semantic relations, text
mining, word net, word senses.
1 Introduction
This paper presents an Arabic lexical semantics LR that is composed of the following
four logical components:
1- A compact basis set of predefined semantic fields; i.e. word senses.
2- Lexical semantics relational data base (RDB) where the Arabic lexical compounds
from a given lexicon are one-to-many mapped to semantic fields both in the
forward and backward directions.