Haystack: A Customizable General-Purpose Information Management Tool for End Users of Semistructured Data David R. Karger * Karun Bakshi David Huynh Dennis Quan Vineet Sinha MIT Computer Science and AI Lab 32 Vassar St. Cambridge, MA 02139 USA {karger,kbakshi,dfhuynh,vineet}@mit.edu, dennisq@us.ibm.com Abstract We posit that a semistructured data model offers the right balance of rich structure and flexible (or lack of) schema allowing naive end users to record information in whatever form makes it easy for them to manage. We de- scribe our Haystack system, which exposes the richness and flexibility of the data model while offering the user natural, traditional interfaces that shield them from the specifics of schemas, tuples, and database queries. We outline re- search challenges that remain to be addressed. 1 Introduction The Haystack project is driven by the idea that every individual works with information in his or her own way. All have different needs and preferences regarding which information objects need to be stored, viewed, and retrieved; what relationships or attributes are worth storing and recording to help find information later; how those relationship or attributes should be pre- sented and manipulated when inspecting objects and navigating the information space; how information should be gathered into coherent workspaces in order to complete a given task. * Research supported by the Packard Foundation, The MIT- NTT Alliance, MIT Project Oxygen, and the HP-MIT Alliance IBM T.J. Watson Research Center, 1 Rogers St., Cam- bridge, MA 02139 Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 2003 CIDR Conference At present, it is usually left to developers to make such decisions and hard-code them into applications: choos- ing a particular class of objects that will be managed by the application, deciding on what schemata those objects meet, developing particular displays of those information objects, and gathering them together into a particular workspace. We posit that no developer can predict all the ways a user will want to record, view, annotate, and manipulate information, and that as a result the hard-coded information designs inter- fere with users’ ability to make the most effective use of their information. Haystack aims to give end users significant control over all of the facets mentioned above. Haystack stores (by reference) arbitrary objects of interest to the user. It records arbitrary properties of and relationships be- tween the stored information. Its user interface flexes to present and to support manipulation of whatever objects and properties are stored, in a meaningful fash- ion. To give users flexibility in what they store and re- trieve, Haystack coins a uniform resource identifier (URI) to name anything of interest—a digital docu- ment, a physical document, a person, a task, a com- mand or menu operation, or an idea. Once named, the object can be annotated, related to other objects, viewed, and retrieved. To support retrieval, Haystack lets a user record ar- bitrary (predefined or user-defined) properties to cap- ture any attributes of or relationships between infor- mation that the user considers important. The prop- erties serve as useful query arguments, as facets for metadata-based browsing, or as relational links to sup- port the associative browsing typical of the World Wide Web. Haystack’s user interface is designed to flex with the information space: instead of using predefined, hard-coded layouts of information, Haystack inter- prets “view prescriptions” that describe how different types of information should be presented—for exam-