A Comparative Analysis of Institutional Repository Software Siddharth Kumar Singh (singh84@purdue.edu ), Department of Computer Science, Purdue University Michael Witt (mwitt@purdue.edu ), Purdue University Libraries Dorothea Salo (dsalo@library.wisc.edu ), University of Wisconsin-Madison Libraries Introduction This proposal outlines the design of a comparative analysis of the four institutional repository software packages that were represented at the 4 th International Conference on Open Repositories held in 2009 in Atlanta, Georgia: EPrints, DSpace, Fedora and Zentity [1]. The study includes 23 qualitative and quantitative measures taken from default installations of the four repositories on a benchmark machine with a predefined base collection. The repositories are also being assessed on the execution of four common workflows: consume, submit, accept, and batch. A panel of external reviewers provided feedback on the design of the study and its evaluative criteria, and input is currently being solicited from the developer and user communities of each repository in order to refine the criteria, measures, data collection methods, and analyses. The aim is to produce a holistic evaluation that will describe the state of the art in repository software packages in a comparative manner, similar in approach to Consumer Reports [2]. The output of this study will be highly useful for repository developers, repository managers, and especially those who are selecting a repository for the first time. As members of these respective communities and the organizations who support them are increasingly collaborating (e.g, DuraSpace), this study will help identify the relative strengths and weaknesses of each repository to inform the “best-of-breed” in future solutions that may be developed. The study’s methods will be presented in a transparent manner with documentation to support their reproducibility by a third party. Related Work Surveys of institutional repository deployment by Joan Lippincott and Cliff Lynch [3] and Gerard van Westrienen [4] were conducted as early as 2005 in the United States and 12 other countries, which were followed up in 2006 by Charles W. Bailey, Jr., for the Association of Research Libraries [5]. These sought to characterize the current state of institutional repository deployment and operation at the time. With support from the Joint Information Systems Committee (JISC) and other agencies, the United Kingdom has taken a leadership role in fostering the growth and development of institutional repositories. Resources such as the Repository Support Project [6] and the Institutional Repository Infrastructure wiki [7] provide a supporting context for this study and helped to formulate it. The scalability and performance of repositories has been explored using a community-based approach for Fedora [8] and in controlled experiments for DSpace by Misra et. al [9] and Lewis [10]. Lastly, analysis performed by the Sheridan Libraries at Johns Hopkins University to connect user requirements to repository functionality [11] informed our selection of the four, basic workflows (consume, submit, accept, and batch) to analyze and supported our decision to maintain a high-level, holistic focus in this study. Benchmark The study was performed on a Dell Optiplex 755 personal computer with an Intel core 2 duo 2.66 GHz processor, 4 GB of memory, 145 GB of SATA primary hard drive, and an on-board single gigabit network adapter. The hardware specification came from the investigators’ extrapolation of what machine is likely to be considered “current and typical” based on the equipment survey in ARL SPEC Kit 292 [11] and