Repeatability and Workability Evaluation of SIGMOD 2011 Philippe Bonnet 1 , Stefan Manegold 2 , Matias Bjørling 1 , Wei Cao 3 , Javier Gonzalez 1 , Joel Granados 1 , Nancy Hall 4 , Stratos Idreos 2 , Milena Ivanova 2 , Ryan Johnson 5 , David Koop 6 , Tim Kraska 7 , René Müller 8 , Dan Olteanu 9 , Paolo Papotti 10 , Christine Reilly 11 , Dimitris Tsirogiannis 12 , Cong Yu 13 , Juliana Freire 6 , and Dennis Shasha 14 1 ITU, Denmark 2 CWI, Netherlands 3 Remnin University, China 4 University of Wisconsin, USA 5 University of Toronto, Canada 6 University of Utah, USA 7 UC Berkeley,USA 8 IBM Almaden, USA 9 Oxford University, UK 10 Università Roma Tre, Italy 11 University of Texas Pan Am, USA 12 Microsoft, USA 13 Google, USA 14 New York University, USA ABSTRACT SIGMOD has offered, since 2008, to verify the exper- iments published in the papers accepted at the confer- ence. This year, we have been in charge of reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (work- ability). In this paper, we assess the SIGMOD repeata- bility process in terms of participation, review process and results. While the participation is stable in terms of number of submissions, we find this year a sharp con- trast between the high participation from Asian authors and the low participation from American authors. We also find that most experiments are distributed as Linux packages accompanied by instructions on how to setup and run the experiments. We are still far from the vision of executable papers. 1. INTRODUCTION The assessments of the repeatability process conducted in 2008 and 2009 pointed out several problems linked with reviewing experimental work [2, 3]. There are ob- vious barriers to sharing the data and software needed to repeat experiments (e.g., private data sets, IP/licensing issues, specific hardware). Setting up and running ex- periments requires a lot of time and work. Last but not least, repeating an experiment does not guarantee its correctness or relevance. So, why bother? We think that the repeatability pro- cess is important because it is good scientific practise. To quote the guidelines for research integrity and good scientific practice adopted by ETH Zurich 1 : All steps in the treatment of primary data must be documented in a form appropriate to the discipline in question in such a way as to ensure that the results obtained from the primary data can be reproduced completely. The repeatability process is based on the idea that in our discipline, the most appropriate way to document the treatment of primary data is to ensure that either (a) the computational processes that lead to the gener- ation of primary data can be reproduced and/or (b) the computational processes that execute on primary data can be repeated and possibly extended. Obviously, the primary data obtained from a long measurement cam- paign cannot be reproduced. But our take is that the best way to document the treatment of these primary data is to publish the computational processes that have been used to derive relevant graphs. On the other hand, the primary data obtained when analyzing the performance of a self-contained software component should be re- producible. Ultimately, a reviewer or a reader should be able to re-execute and possibly modify the computa- tional processes that led to a given graph. This vision of executable papers has been articulated in [1]. This year, as a first step towards executable papers, we encouraged SIGMOD authors to adhere to the fol- 1 http://www.vpf.ethz.ch/services/ researchethics/Broschure SIGMOD Record, June 2011 (Vol. 40, No. 2) 45