Muse: A System for Understanding and Designing Mappings Bogdan Alexe, Laura Chiticariu UC Santa Cruz abogdan@cs.ucsc.edu, laura@cs.ucsc.edu Renée J. Miller U. of Toronto miller@cs.toronto.edu Daniel Pepper, Wang-Chiew Tan UC Santa Cruz dpepper@ucsc.edu, wctan@cs.ucsc.edu ABSTRACT Schema mappings are logical assertions that specify the relation- ships between a source and a target schema in a declarative way. The specification of such mappings is a fundamental problem in information integration. Mappings can be generated by existing mapping systems (semi-)automatically from a visual specification between two schemas. In general, the well-known 80-20 rule ap- plies for mapping generation tools. They can automate 80% of the work, covering common cases and creating a mapping that is close to correct. However, ensuring complete correctness can still require intricate manual work to perfect portions of the mapping. Previous research on mapping understanding and refinement and anecdotal evidence from mapping designers suggest that the map- ping design process can be perfected by using data examples to explain the mapping and alternative mappings. We demonstrate Muse, a data example driven mapping design tool currently imple- mented on top of the Clio schema mapping system. Muse leverages data examples that are familiar to a designer to illustrate nuances of how a small change to a mapping specification changes its seman- tics. We demonstrate how Muse can differentiate between alterna- tive mapping specifications and infer the desired mapping seman- tics based on the designer’s actions on a short sequence of simple data examples. Categories and Subject Descriptors: H.2.1 [Logical Design]: Schema and subschema, D.2.2 [Design Tools and Techniques], H.2.5 [Heterogeneous Databases] General Terms: Design, Algorithms, Languages Keywords: schema mappings, data exchange, data translation, data examples, design, refinement 1. INTRODUCTION Schema mappings, or mappings in short, are logical assertions that specify the relationships between a source and a target schema in a declarative way. The specification of such mappings is a fun- damental problem in information integration. Existing mapping systems such as Clio [6], HePToX [2], and IBM’s Rational Data Architect [5], can (semi-)automatically generate mappings from a visual specification between two schemas. In general, the well- known 80-20 rule applies for mapping generation tools. They can automate 80% of the work, covering common cases and creating a mapping that is close to correct. However, ensuring complete cor- rectness can still require intricate manual work to perfect portions of the mapping. Copyright is held by the author/owner(s). SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada. ACM 978-1-60558-102-6/08/06. As described in [1], the mapping design process can be perfected by using data examples to explain the mapping and alternative map- pings. Mapping designers usually understand their data better than they understand mapping specifications. Hence, familiar data ex- amples could be leveraged to illustrate nuances of how a small change to a mapping specification changes its semantics. Motivated by the observations above, we have built Muse, a data example driven mapping design tool currently implemented on top of Clio [6]. Muse can differentiate between alternative mapping specifications and infer the desired mapping semantics based on the designer’s actions on a short sequence of simple data examples. Summary of demonstration features. We demonstrate Muse-G, the component of Muse which helps a designer derive the desired grouping semantics for a mapping specification using examples. For instance, through examples, Muse-G can infer whether the de- signer intends to group projects by a company’s name and location or only by a company’s name. Grouping or combining related data together is an essential functionality of many integration systems. However, tools such as [2, 5, 6] define a default grouping function for every target nested set in a mapping, which can only be manu- ally modified. This can prove to be a difficult task, if the schemas or the number of possible arguments for a grouping function are large. Indeed, if there are n possible attributes to group by, then there are 2 n choices of grouping functions. Furthermore, it may not be ob- vious to a designer, what the n possible grouping attributes are [4, 6]. To illustrate this point, our demo will take users through ex- ample mapping scenarios, and give them an opportunity to deduce the possible grouping attributes themselves. We will demonstrate that this is not always an easy cognitive task. We demonstrate how Muse-G infers the desired grouping semantics through the actions taken by the designer on a short sequence of simple data exam- ples. We also demonstrate how Muse-G exploits source schema constraints (keys and functional dependencies in general), when available, to reduce the number of examples presented to a de- signer. The interactive nature of the demonstration will allow us to show how the examples illustrate design alternatives in a much more natural way than having the designer deduce the proper se- mantics and edit the mapping specification directly. In addition, we demonstrate how Muse-G supports the incremental design of a grouping function without restarting the process from scratch. We also demonstrate Muse-D, the component of Muse which helps a designer choose among alternative interpretations of an am- biguous mapping. Intuitively, a mapping is ambiguous if it speci- fies, in more than one way, how an atomic target schema element is to be obtained. For example, a schema mapping could be ambigu- ous because it asserts that a project supervisor is a project man- ager or a project tech-lead at the same time. We demonstrate how Muse-D can help a mapping designer understand and differentiate 1281