Exploiting Schemas in Data Synchronization J. Nathan Foster, Michael B. Greenwald, Christian Kirkegaard, Benjamin C. Pierce, and Alan Schmitt Abstract. Increased reliance on optimistic data replication has led to burgeoning interest in tools and frameworks for synchronizing discon- nected updates to replicated data. We have implemented a generic syn- chronization framework, called Harmony, that can be used to build state- based synchronizers for a wide variety of tree-structured data formats. A novel feature of this framework is that the synchronization process—in particular, the recognition of conflicts—is driven by the schema of the structures being synchronized. We formalize Harmony’s synchronization algorithm, state a simple and intuitive specification, and illustrate how it can be used to synchronize trees representing a variety of specific forms of application data, including sets, records, and tuples. 1 Introduction Optimistic replication strategies are attractive in a growing range of settings where weak consistency guarantees can be accepted in return for higher avail- ability and the ability to update data while disconnected. These uncoordinated updates must later be synchronized (or reconciled) by automatically combining non-conflicting updates while detecting and reporting conflicting updates. Our long-term aim is to develop a generic framework that can be used to build high-quality synchronizers for a wide variety of application data formats with minimal effort. As a step toward this goal, we have designed and built a prototype synchronization framework called Harmony, focusing on the important special cases of unordered and rigidly ordered data (including sets, relations, tu- ples, records, feature trees, etc.), with only limited support for list-structured data such as structured documents. An instance of Harmony that synchronizes multiple calendar formats (Palm Datebook, Unix ical, and iCalendar) has been deployed within our group; we are currently developing Harmony instances for bookmark data (handling the formats used by several common browsers, includ- ing Mozilla, Safari, and Internet Explorer), address books, application preference files, drawings, and bibliographic databases. The Harmony system has two main components: (1) a domain-specific pro- gramming language for writing lenses —bi-directional transformations on trees— which we use to convert low-level (and possibly heterogeneous) concrete data for- mats into a high-level synchronization schema, and (2) a generic synchronization algorithm, whose behavior is controlled by the synchronization schema. The synchronization schema actually guides Harmony’s behavior in two ways. First, by choosing an appropriate synchronization schema (and the lenses that transform concrete structures into this form and back), users of Harmony can