International Journal on Islamic Applications in Computer Science And Technology, Vol. 3, Issue 2, June 2015, 10-18 An LMF-based Normalization approach of Arabic Islamic dictionaries for Arabic Word Sense Disambiguation: application on hadith Nadia Soudani 1,5,a , Ibrahim Bounhas 2,5,b , Bilel ElAyeb 3,6,c , Yahya Slimani 4, 5,d 1 Faculty of Sciences of Tunis (FST), University of Tunis El Manar, Tunisia 2 Higher Institute of Documentation (ISD), University of Manouba, Tunisia 3 Emirates College of Technology, P.O. Box: 41009. Abu Dhabi, United Arab Emirates. 4 Higher Institute of Multimedia Arts of Manouba (ISAMM), University of Manouba, Tunisia 5 LISI Laboratory of computer science for industrial systems, Carthage University, Tunisia 6 RIADI Laboratory, National School of Computer Science (ENSI), Universiy of Manouba, Tunisia a Nadia.soudani@gmail.com, b Bounhas.Ibrahim@gmail.com, c Bilel.Elayeb@riadi.rnu.tn, d Yahya.Slimani@fst.rnu.tn ABSTRACT In this paper, we propose an approach for normalizing Arabic Dictionaries. This approach is used to transform non structured Arabic dictionaries into LMF (Lexical Markup Framework) based-normalized ones. We are basically exploiting Arabic Islamic dictionaries of hadith. An ontology will be then constructed from these normalized dictionaries. This ontology will contain explicit and formal knowledge about information in hadith. It will be used later by an information retrieval system for Word Sense Disambiguation of Arabic terms of hadith either in the formulated user query or in the texts of hadith. Keywords: Arabic language, Arabic Islamic Dictionary, hadith, LMF, Ontology, Word Sense Disambiguation, Information Retrieval System 1. Introduction The corpus of hadith constitutes a rich set of knowledge which it is still ineffectively exploited. In addition, Arabic language has particular linguistic features at the morphologic, syntactic and semantic levels, which cause a lot of ambiguity with Arabic terms and particularly with the terms of hadith (Bounhas et al., 2011a). These specificities are challenging knowledge extraction in these collections. Natural Language Processing (NLP) of the Arabic Language suffers from the lack of linguistic resources as corpora, dictionaries, ontologies and standards test collections (Bounhas et al., 2011a; Bounhas, 2012; Jarrar, 2011). However, the existing resources as electronic dictionaries are neither exhaustive nor standardized. Then, their exploitation by Information Retrieval (IR) tools couldn’t be effectively achieved. As a result, the performance of these tools in processing Arabic linguistic resources was declined in terms of relevance of search results. We essentially start dealing with these matters through the advent of the semantic web (Beseiso et al., 2010). In a context of IR, ambiguity is detected both in the text of the query and in the text of hadith itself. We