Supporting Collaboration in the Era of Internet-Scale Data Cameron Marlow Facebook, Inc. 1 Introduction Reproducibility is an often-cited and valid concern of the research being performed by many corporate research programs, such as the one I work with at Facebook. We consistently run experiments on millions of active users using proprietary systems, gather results on data infrastructure at massive scale, and produce reports which distill this process into a few lines of context. It is no wonder that many papers from internet research labs are returned with comments to how the results are interesting but entirely irreproducible. This effect is just one symptom of the growing gap between the instruments avail- able to researchers studying social computing, human-computer interaction, recommender systems, and auction theory, among others. On one side of this divide are academics, depending on shared data sets and infrastructure to enable the collective advancement of science, cooperating with Institutional Review Boards and beholden to funding agencies. On the other side are industrial researchers, utilizing proprietary data and infrastructure to driving science forward, maintaining privacy and Terms of Service (TOS), and beholden the goals of the corporation. It is hard to say how wide this gap is, but clear that the computational power of the likes of Google and Facebook continue to grow. When asked to produce a position paper on challenges in studying technology-mediated participation, I thought naturally to address the questions I most often get from academic researchers: can I have some data? Can I crawl the users on my university network? Per- haps run a query on your databases? At the same time, papers are regularly published which violate Facebook’s TOS, expose users privacy, and without any regulation by ethics review boards. In this paper I hope to describe what Facebook has to offer, expose some of the challenges we currently face in engaging with academia, and propose some pos- sible solutions which allow for direct collaboration while upholding all legal and ethical guidelines. 2 Anatomy of an Internet-Scale Social Research Tool Before addressing the outstanding challenges to sharing research agendas with academia, I will first introduce some of the basic components of a modern internet-scale research 1