Using Containment Information for View Evolution in Dynamic Distributed Environments Anisoara Nica Department of EECS University of Michigan, Ann Arbor Ann Arbor, MI 48109-2122, USA anica@eecs.umich.edu Elke A. Rundensteiner Department of Computer Science Worcester Polytechnic Institute Worcester, MA 01609-2280, USA rundenst@cs.wpi.edu Abstract The maintenance of materialized views in large-scale environments composed of numerous information sources (ISs), such as in the WWW, is complicated by ISs not only continuously modifying their contents but also their ca- pabilities (schemas and query interfaces). With current view technology, views become undefined when ISs change their capabilities. Our Evolvable View Environment (EVE) project addresses this new problem of evolving views un- der IS capabilities changes, which we coin view synchro- nization problem. Key principles of EVE include a user- specified preference model for view evolution (Evolvable- SQL (E-SQL)) and a Model for Information Source De- scriptions (MISD). In this paper, we first present a formal characterization of correctness of view synchronization us- ing containment constraints defined in MISD. Then, we give a novel view synchronization algorithm for view rewriting exploiting general containment constraints between the to- be-replaced relation and its replacement. 1. Introduction Advanced applications such as web-based information services, data warehousing, digital libraries, and data min- ing often specify materialized views on a large number of possibly heterogeneous information sources (ISs) for the purpose of constructing web resources, electronic catalogs, repositories of digital material, or data warehouses for anal- ysis and mining [14]. However, views in such evolving en- vironments introduce new challenges to the database com- munity. One important issue for these applications is that current view technology only supports static view defini- This work was supported in part by the NSF NYI grant #IRI 94-57609. We would also like to thank our industrial sponsors, in particular, IBM and Informix. tions meaning that views are assumed to be specified on top of a fixed environment. Once the underlying ISs change their capabilities, the views derived from them become un- defined. It is this problem of view evolution caused by ex- ternal schema environment changes (i.e., at the schema level rather than just at the data level) that we are studying in this paper. We call this the view synchronization problem. The key contributions of our Evolvable View Environ- ment EVE solution [12, 3, 9] developed to target the view synchronization problem, include the design of an extended view definition language (E-SQL) that incorporates user preferences for change semantics of the view (Section 3) and the design of the model for information source descrip- tion (MISD) for capturing the capabilities of each IS as well as the interrelationships between ISs, such as containment constraints, join constraints, etc. These MISD descriptions can be exploited when searching for an appropriate substi- tution for the components of views that are affected by an IS capability change. E-SQL and the MISD represent the ba- sis for our strategies for rewriting views based on changing ISs, which we call the view synchronization process. In this paper, we now present the first view synchro- nization algorithm that requires only the availability of con- tainment information among information sources in order to evolve views. Unlike the strategies proposed for query rewriting using views in the database literature ([5, 13]), our techniques proposed in this paper address three new issues: (1) finding view rewritings that are not necessarily equiva- lent to the original view definition, (2) preserving only in- dispensable attributes from the SELECT clause if preserv- ing all is not possible; and (3) using semantic containment information for replacing the deleted relation. (1) and (2) are possible because our E-SQL view definition language allows view users to explicitly define evolution preferences such as what attributes are to be kept in the SELECT clause (Section 3). (3) is possible because the containment infor- mation, expressed as MISD constraints (Section 3), mod-