Journal of American Science 2010; 6(4) http://www.americanscience.org editor@americanscience.org 67 Similarity Identification and Measurement between Ontologies Amjad Farooq and Abad Shah Computer Science and Engineering Department University of Engineering and Technology, Lahore – Pakistan amjadfarooquet@gmail.com Abstract: The retrieval of relevant and precise information from web has always been remained a serious problem. To address this problem, the idea of ontologies-based web, so-called semantic web, was proposed in 2001. But the problem is not completely solved due to the semantic heterogeneity suffered by ontologies. In this paper we propose a semi-automatic technique to measure the explicit semantic heterogeneity. The proposed technique identifies all candidate pairs of similar concepts without omitting any similar pair. The proposed criteria for similarity measurement are based on theme semantic web. The proposed technique can be used in different types of operations on ontologies such as merging, mapping and aligning. By analyzing its results a reasonable improvement in terms of completeness, correctness and overall quality of the results has been found. . [Journal of American Science 2010; 6(4):67-85]. (ISSN: 1545-1003). Keywords: Semantic Web, Heterogeneity, Ontology Matching, Similarity Identification 1. Introduction The World Wide Web (or the Web) is a global source of information, which includes information about almost every topic that a person can think. But it is difficult to retrieve relevant, specific and precise information due to semantic heterogeneity and the lack of machine understandability of contents. It has been estimated that only 37 percent to 52 percent relevant results are retrieved and other retrieved results are irrelevant (Lewandowski, 2008). The idea of semantic web was envisioned by Lee (Lee et al., 2001), which provides a promising solution to overcome the retrieval performance problem of the web. According to the theme of the semantic web, the web-contents need to be structured, formalized, stored and retrieved through ontologies. When multiple ontologies are simultaneously used in the integrating operations such as merging, mapping and aligning then they may suffer from different types of heterogeneities such as semantic heterogeneity and non-semantic or syntactic heterogeneity (Shvaiko & Euzenat, 2008; Hauswirth & Maynard, 2007). The syntactic heterogeneity occurs due to the use of different languages. The semantic heterogeneity includes terminological, conceptual and contextual heterogeneities. The terminological heterogeneity arises when different terms are used to represent the same concept or the same term is used to represent different concepts. The conceptual heterogeneity between two concepts may occur due to their different level of granularities i.e., when a concept is sub-concept or super-concept of the other, or both are overlapped. Similarly, two concepts are explicit- semantically heterogeneous if they are terminologically and taxonomically similar but they have different roles or functionalities in their respective ontologies. To handle the problem of ontological semantic heterogeneity, it is required to identify the similarity between ontologies. For this purpose different techniques have been proposed and reported in the literature (Shvaiko & Euzenat, 2009; Maedche & Staab, 2002; Hariri et al., 2006; Aleksovski et al., 2006; Trojahn et al., 2008; Jeong et al., 2008;Noy & Musen, 2001; Melnik et al., 2002). However, some issues are still unsolved. Explicit semantic similarity needs to be measured in order to carry the vision of semantic web (González, 2005; Uschold, 2003; Uschold, 2002). The measurement of degree of similarity (DoS) based on Edit-distance formula, is unreliable because it measures the DoS based on the criteria of finding terms-similarity rather than finding similarity between concepts represented by the terms. The criteria as reported in (Shvaiko & Euzenat, 2005; Erhard & Philip, 2001; Lambrix &Tan, 2006), used for the identifying taxonomic similarity between concepts of two ontologies declare certain pairs of similar concepts as dissimilar due to the biasness of these criteria towards those concepts whose siblings- concepts, sub-concepts or direct super-concepts are not similar. Most of the existing similarity measurement techniques only compute the DoS between concepts of two ontologies (Buccella et al., 2005; Giunchiglia et al., 2007), which is inadequate to determine that which concept is more generic or more specific than the other, and this issue is considered as an open research issue (Janowicz et al., 2008). Similarly, some existing techniques compute only the Semantic Relation (SR) between two