AbstractIn this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study. KeywordsText classification, HTML documents, Web pages, Machine learning, Fuzzy logic, Arabic Web pages. I. INTRODUCTION ITH the rapid growth of the Internet, there is an increasing need to provide automated assistance to Web users for Web page classification. Such assistance is helpful in organizing the vast amount of information returned by search engines, or in constructing catalogues that organize Web documents into hierarchical collections [1]. Classification is expected to play an important role in future search services. For example, Chen et al. [2] showed that users prefer navigating through catalogues of pre-classified content. In order to meet such a strong need, we need automated Web-page classification techniques. Web-page classification is harder than free text classification because of the noisy information founded in them such as advertisement represented through images, media sounds, navigation bars, and page formatting. So we need to summarize and benefit from these data and make them useful for end user who needs to manage and plan their work depending on a more accurate classification process. It is an essential matter to focus on the main subjects and significant content. As a result the critical task to deal with ambiguous web pages and their embedded structure through studying HTML language to remedy the process and then using some classification method such as machine learning, or fuzzy set theory [3]. The language may affect the whole process because of its complexity for dealing with words and phrases, which occurs frequently in Arabic language, in which this language has a Ahmad T. Al-Taani is with the chairman of the department Computer Sciences, Yarmouk University, Irbid, Jordan. (Correspondence author: e-mail: ahmadta@yu.edu.jo.) Noor Aldeen K. Al-Awad was graduated from the Department of Computer Sciences, Yarmouk University, Irbid, Jordan, in 2005 (e-mail: noor_kamel@yahoo.com.) little volume of spreading among the web in comparison to the other languages, and here are some factors that shows a clear picture about that: 1. A word may act to be different, depending on the context in which it will occur, so the word may share equally or nearly equal in different classes, so that it makes an ambiguous view, like ( ﻮلѧ رﺳ ѧرﺳ) , in which it may mean Mohammed (God's praise and peace upon him), messenger, emissary, plover, etc... 2. There are some cases in which words may have more than one root in the native language." ﻴﺎجѧ" it has two roots (" ﺳﺎج" , " ﺳﻴﺞ") 3. How to verify from the word structure itself if it starts with the present tense prefixes such as “ ﻮىѧﺗﻘ”, ﻳﻤﻴﻦ”. 4. There is no indication about the origin of the word if it is a verb or noun; as the following example shows: ( ﺴﻴﺮѧ) it may be interpreted as the present tense of the verb ( ﺎرѧ) or it may be interpreted as the noun that means: (اﻟﺴﻬﻮﻟﺔ) simple or facile or uncomplicated. 5. And there are some idioms that occur frequently, and have no direct relevance to any of the categories such as “ѧ اﻟﻨﻈѧﺑﻐ”,” ѧ إﻟﺎﻓﺔѧﺑﺎﻹﺿ”, ѧ ﻟﺤѧ اﻟﺤ" " ѧ اﻟﺤﺴﻮءѧ" , etc... Recently much work has been done on Web-page classification [1] [4-16]. In these approaches different methods are proposed. These methods includes: Web summarization-based classification, fuzzy similarity, natural language parsing web page classification and clustering to find reliable list answers, text classification approach using supervised neural networks, machine learning methods, kNN model-based classifier, and fuzzy classifiers. In this study, an analysis and comparison of six fuzzy similarity approaches applied to Arabic web pages classification is presented. The clustering scheme is built and known for each category from training documents and the similarity between a test document and a category is measured using a fuzzy relation. This relation is called fuzzy term-category relation; where the set of membership degree of words to a particular category represents the cluster prototype of the learned model. Based An Empirical Analysis of Arabic WebPages Classification using Fuzzy Operators Ahmad T. Al-Taani and Noor Aldeen K. Al-Awad W International Journal of Computational Intelligence 5:1 2009 30