Mediatability: Estimating User Effort to Mediate Between Two XML Schemas Karthik Gomadam 1 , Ajith Ranabahu 1 , Kunal Verma 2 , Lakshmish Ramaswamy 3 and Amit P. Sheth 1 kgomadam@gmail.com, {ranabahu.2, amit.sheth} @wright.edu, k.verma@accenture.com, laks@cs.uga.edu 1 Knoesis Center, Dayton, OH, USA. 2 Accenture Technology Labs, CA USA. 3 University of Georgia, GA USA. Abstract Mediation and integration of data is one of the impor- tant challenges because of the ever-increasing number of services on the Web and heterogeneities in their data rep- resentations. Towards addressing this challenge, we intro- duce a new measure called mediatability. Mediatability is a quantifiable and computable metric that measures the de- gree and complexity of human involvement in XML schema mediation. We also present an efficient algorithm to com- pute mediatability. We provide an experimental study to an- alyze the impact of having semantic annotations in deter- mining the ease of mediation between two schemas. We val- idate our approach by comparing the mediatability scores generated by our system against that of user perceived dif- ficulty in mediation. We also evaluate the scalability of our system in both client and server contexts. Web 1 Introduction The increased adoption of the REpresentational State Transfer paradigm [5] has made it easier to create and share services on the Web. RESTful services often take the form of RSS/Atom feeds and AJAX based light weight services. The XML based messaging paradigm of RESTful services has made it possible to bring discrete data from services together and create more meaningful data sets. This is be- ing referred to as building a mashup. A mashup is the cre- ation of a new Web application using two or more existing Web application interfaces. Some of the problems that are viewed as an impediment for developers to create mashups are : 1) the programming skill required to develop such ap- plications (largely due to complexity of languages such as javascript) and 2) the arduous task of mapping the output of one service to the input of another. Frameworks such as Google Mashup Editor 1 and IBM Sharable Code 2 have addressed the first problem with reasonable success by cre- ating programming level abstractions. However, little work has been towards helping the developers in the task of data mediation. The importance of understanding and addressing the problem of data mediation in distributed systems is under- scored by the volume of research in matching and mapping heterogeneous data. Matching is the task of finding cor- respondences between elements in schemas or instances. Once the corresponding elements are identified, mapping defines the rules to transform elements from one schema into another. Matching and mapping have been well stud- ied by various researchers including [7], [16] and [8] in different contexts. Considerable research effort has gone into creating frameworks that attempt automated and semi- automated matching and mapping of heterogeneous data. Much of these efforts however, have yielded limited success and developers are often left with the hard task of perform- ing the mediation manually. The end goal of traditional schema matching has been to establish semantic similarity between schema elements. However, semantic equivalence does not guarantee inter- operation. Depending on the heterogeneities between the schemas , mediation is harder or even impossible to auto- mate [16]. Even when mediation is manual, it is hard to estimate the complexity for a developer to perform medi- ation between the two schemas. The goal of this paper is go a step beyond matching and define mediatability as a measure of the degree and complexity of human involve- ment. We believe that such a measure would help users in selecting services, especially in the REST services scenario, where often one has to choose from a plethora of services that offer the same features with very little separation. Our experience with IBM Sharable Code [9] largely 1 http://editor.googlemashups.com/editor 2 http://services.alphaworks.ibm.com/isccore 1