Assuring Retrievability from Unstructured Databases by Contexts Amihai Motro Department of Computer Science University of Southern California Los Angeles, CA 90089 Abstract In an unstructured database the data is a collection of facts that does not adhere to any schema. Such a database does not require any initial design and can thereff)re evolve freely to accommodate new applications. It is particularly suitable for information which is diverse and idiosyncratic, such as when we want to store everything known on a particular topic. Unfortunately, this freedom also means that similar information may be entered in different forms. This may cause severe problems when retrieval is attempted, as some of the data may appear to have been "lost* in the database. In this paper we propose a method to solve this problem. Each database fact must be supported by a context in the database, in the form of several other facts. When an attempt is made to add a fact to the database, the existence of a suitable context is verified, or is extracted from the user in a simple dialogue. Thus, the database still retains the flexibility of unstructured databases, but problems of multiple representations are usually prevented. 1. Introduction Most database management systems employ data models that are structured (or strictly-typed). The network, the hierarchical and the relational data models are all examples of the structured approach. Such models enforce a database design that is both restrictive and permanent. Restrictive, because the design relies heavily on broad categorizations, that apply to large classes of instances. Permanent, because in general these models require a priori commitment to a particular design. Consequently, structured models are suitable mostly for traditional database applications in which the environment to be modelled lends itself to simple categorizations and is relatively stable. For example, a typical data model will record employees and departments with a fixed number of attributes, such as EMPLOYEE-NO, EMPLOYEE-NAME and EMPLOYEE-ADDRESS, DEPARTMENT-NAME, DEPARTMENT-HEAD and DEPARTMENT-OFFICE. The relationship between employees and departments will also have to be determined and defined; for example, WORKS-FOR may associate each employee with at most one department. These few generic attributes, that are applicable to all employees and all departments, are limited in their ability to capture the differences between individual instances of employees or departments. In addition, if this design later proves to be unsatisfactory, modifications may require substantial effort. While these limitations are not always objectionable, structured models are inadequate in situations where there is need to model data which is more diverse and idiosyncratic. An example is a database in which one wishes to record all that one knows about a topic. Such databases are quite impossible to design, as the data does not easily fit into uniform structures, and the eventual scope of the database is initially unknown. An attractive approach for such situations is a database that is unstructured (or loosely-typed). The database is merely a container that can hold diversified information, into which one can toss information casually. Such an architecture requires no commitment to a particular design and can therefore accommodate any evolution in the contents of the database. As there is no structure, it can accommodate data with all its complexities and idiosyncrasies. A flexibility of this sort is available in pile structures, which are aggregates of records that do not adhere to any uniform record type and are not organized in any meaningful way (a detailed discussion of the applicability and performance of piles can be found in [17]). However, unstructured databases are not necessarily unorganized: to facilitate access they may adopt some internal organization, such as rings or indexes. Other efforts that can be classified as supporting an unstructured approach, are mostly based on semantic networks or logic (a good review of the topic can be found in [15]). CH2261-6/86/0000/0426501.00 © 1986 IEEE 426