Semantic Matching to Achieve Web Service Discovery and Composition Rama Akkiraju 1 , Biplav Srivastava 2 , Anca Ivan 1 , Richard Goodwin 1 , Tanveer Syeda-Mahmood 3 1 IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532, USA 2 IBM India Research Laboratory, Block 1, IIT Campus, Hauz Khaus, New Delhi, 11016, India 3 IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA {akkiraju@us, sbiplav@in, ancaivan@us, rgoodwin@us, stf@almaden}.ibm.com Abstract In this paper, we present a novel algorithm to discover and compose web services in the presence of semantic ambiguity by combining semantic matching and AI planning algorithms. Specifically, we use cues from domain-independent and domain-specific ontologies to compute an overall semantic similarity score between ambiguous terms. This semantic similarity score is used by AI planning algorithms to guide the searching process when composing services. In addition, we integrate semantic and ontological matching with an indexing method, which we call attribute hashing, to enable fast lookup of semantically related concepts. 1. Introduction In implementing service-oriented architectures, Web services are becoming an important technological component. Web services matching and composition has become a topic of increasing interest in the recent years with the gaining popularity of Web services. Two main directions have emerged. The first direction investigated the application of AI planning algorithms to compose. The second direction explored the application of information retrieval techniques. However, to the best of our knowledge, these two techniques have not been combined to achieve compositional matching in the presence of inexact terms, and thus improve recall. In this paper, we present a novel approach to compose Web services in the presence of semantic ambiguity using a combination of semantic matching and AI planning algorithms. Specifically, we use domain-independent and domain- specific ontologies to determine the semantic similarity between ambiguous concepts/terms. The domain- independent relationships are derived using an English thesaurus after tokenization and part-of-speech tagging. The domain-specific ontological similarity is derived by inferring the semantic annotations associated with Web service descriptions using ontology. Matches due to the two cues are combined to determine an overall similarity score. This semantic similarity score is used by AI planning algorithms in composing services. By combining semantic scores with planning algorithms we show that better results can be achieved than the ones obtained using a planner or matching alone. In the remainder of the paper, we start with a scenario to illustrate the need for Web services composition in open business domains and discuss how our approach can help in resolving the semantic ambiguities better. We then give details of the SEMAPLAN system and discuss how the engine was customized for the IEEE WS Challenge. 1. A Motivating Scenario In this section, we present a scenario from the knowledge management domain to illustrate the need for (semi) automatic composition of Web services. For example, if a user would like to identify names of authors in a given document, text annotators such as a Tokenizer, which identifies tokens, a LexicalAnalyzer, which identifies parts of speech, and a NamedEntityRecognizer, which identifies references to people and things etc. could be composed to meet the request. The following figure summarizes this composition flow. Figure 1. Example of a composition of Web services In this example, the term lexemeAttr may not match with lemmaProp unless the word is split into lexeme and Attr and matched separately. Using a linguistic domain ontology one can infer that lemma could be considered a match to the term lexeme. Abbreviation expansion rule can be applied to the terms Attr and Prop to expand them to Attribute and Property. Then a consultation with a domain-independent thesaurus such as WordNet [6] Matched Services Request Any Service or Service Combinations Text Named Entity Analyzer Recognizer Doc Tokens Lemma Prop Canonical String Canonical Category Named Entity subClassOf Text subClassOf Lexeme Attr ~= CanStr ~= Tokenizer Lexical Named Entity ~= Proceedings of the 8th IEEE International Conference on E-Commerce Technology and the 3rd IEEE International Conference on Enterprise Computing, E-Commerce, and E-Services (CEC/EEE’06) 0-7695-2511-3/06 $20.00 © 2006 IEEE