It’s All the Same to Me: Data Unification in Personal Information Management David R. Karger * MIT Computer Science and AI Laboratory William Jones February 20, 2006 1 Introduction Information fragmentation is a pervasive problem which is felt in several stages of personal information management (PIM) [ref]. As the example in the introduction to this special issue on PIM illustrates, even a seemingly simple decision, such as whether to say “yes” to an invitation, often depends upon a number of different kinds of information - information from a calendar, from a paper flyer, from web sites, from a previous email conversation, etc. Such information can be fragmented by physical location—some information, for example, may be on a laptop computer we use at home, other information may be on a desktop computer we use at work and one or more PDA or smart phones. But even on a single device, information is often fragmented by the very tools that have been designed to help us manage our information. Applications often store their data in their own particular locations and representations, inaccessible to other applications. Digital documents and other files are managed by the by a file manager, email messages by an email client, bookmarks to web sites by a web browsers, and so on. New applications such as Microsoft OneNote [reference] introduce still more management tools with little or no integration to previous forms. This leads to numerous problems. Broken linkages. Much of interest about our information is its connection to other pieces of information. For example, we like to know that a given individual is the author of a given document, or that a given email message is relevant to a particular appointment. If these pieces of information are managed by different applications, we may find that none of them is able to properly record or present the linkage to the information they do not manage. We may need to annotate the relationship manually, in a comment field not suited to recording it. Later, we may need to perform a difficult search in order to access another representation of information we are already looking at—this presuming in the first place that the user remembers which applications manage the desired information! Partitioned organizations. Each application tends to offer organizational tools for it own, and only its own, infor- mation. Such organizational fragmentation creates problems in everyday PIM actions such as keeping and finding. As a result we may sometimes need to look in several places, physical and virtual, in order to gather together the information we need for a particular task. We may also be less certain where and how to keep newly encountered in- formation[reference Marshall & Jones article in the issue]. Or do “have it” already? If we keep the information again anyway (“just in case”) we may then face some serious problems with consistency and updating later on. People can rightly complain that they have “too many hierarchies” [Boardman et al. 2003; ravasio, 2004] and people sometimes go to great lengths to bring their information together into a single organization whether based in files, paper or email messages [jones 2002]. * M.I.T. Laboratory for Computer Science, 545 Technology Square, Cambridge MA 02139. E-mail: karger@theory.csail.mit.edu. URL: http://theory.csil.mit.edu/˜karger 1