1 Main contacts: {r.harper, etheres, sianl}@microsoft.com + Carnegie Mellon University * University of Texas at Austin Work done while interning with Microsoft Research Ltd. What is a File? First Author Name (Blank if Blind Review) Affiliation (Blank if Blind Review) Address (Blank if Blind Review) e-mail address (Blank if Blind Review) Optional phone number (Blank if Blind Review) Second Author Name (Blank if Blind Review) Affiliation (Blank if Blind Review) Address (Blank if Blind Review) e-mail address (Blank if Blind Review) Optional phone number (Blank if Blind Review) ABSTRACT For over 40 years the notion of the file, as devised by pioneers in the field of computing, has proved robust and has remained unchallenged. Yet this concept is not a given, but serves as a boundary object between users and engineers. In the current landscape, this boundary is showing signs of slippage, and we propose the boundary object be reconstituted. New abstractions of file are needed, which reflect what users seek to do with their digital data, and which allow engineers to solve the networking, storage and data management problems that ensue when files move from the PC on to the networked world of today. We suggest that one aspect of this adaptation is to encompass metadata within a file abstraction; another has to do what such a shift would mean for enduring user actions such as ‘copy’ and ‘delete’ applicable to the deriving file types. We finish by arguing that there is an especial need to support the notion of ‘ownership’ that adequately serves both users and engineers as they engage with the world of networked sociality. Author Keywords File, file systems, databases, cloud computing, grammar of action, metadata, generic object, ownership, possession, command, social networking, consumer device. INTRODUCTION It would seem perfectly reasonable for the man or woman on the street to assume that the term file, when used in connection to a PC, reflects quite closely what they mean when they use the same term to describe the things they have in their non-digital world. One might be digital and the other real, but they are the same sort of things: file. Unfortunately the relationship is not as simple as this and indeed to think it might be or that there is even any straightforward analogue between the use of this word in everyday life and with regard to computing is quite egregious. It turns out the word covers many things; a number of things in ordinary life and a rather different set of things when used in reference to computers. All told so many things, in fact, that the way that it gets used to cross refer between these domains (and indeed sometimes within these domains) often leads to solecisms to absurd meanings and mistaken understandings. Our purpose in this paper is not to list all such misunderstandings, nor to offer correctives to each, even if that were possible. Our goal is to suggest that something more profound is at issue, and that once that is understood a way forward is possible. It seems to us that part of the problem here has to do with the fact that both sides in this equation, the everyday ordinary user and the computer scientist, misunderstand each other by dint of sharing this much used word, file. And, perhaps more surprisingly, the word opens up further misunderstandings for systems designers, those who are engineering systems to which this concept is central. The consequences of the misunderstandings that derive are, we think, greater today than they have ever been. New forms of computer devices and networked systems are emerging and these are offering increasingly more options for ‘where’ and ‘how to’ file; whatis being filed and is being labelled as something to fileis altering too (with documents, music, images and sometimes even postings being treated as file types, for example). User practices are also altering. Key to this has been a shift from a single user negotiating with a file abstraction representative of data stored on a hard disk, to a situation where users are negotiating with other people over a file abstraction where the data itself may be stored in many places: on the local disk, on a cloud server, or on a social network site. What is of interest to the user in the latter case may not so much be the file itself, but the social life the file lets them, the user, have. This is to put it simply. But if this holds true, then a new set of definitions for the term file is required; ones that will allow computer scientists to engineer what users require and provide a meaningful base for those users to act upon. This definition will consist of an abstraction that mediates between these two communities, the user and the engineer, allowing the user to treat content in a particular way, as a file, and for the engineer to structure the underlying protocols and data management issues as efficiently as possible. This new understanding will, it hardly needs saying, go hand in hand with developments and refinements Richard Harper, Eno Thereska, Siân Lindley and Richard Banks, Phil Gosset, William Odom + , Gavin Smyth, Eryn Whitworth * Microsoft Research Ltd. Technical Report MSR-TR-2011-109