1 Designing an Experiment Management Framework for FutureGrid Cloud Services Andrew J. Younge, Javier Diaz, Gregor von Laszewski, Geoffrey C. Fox Pervasive Technology Institute, Indiana University 2729 E 10th St., Bloomington, IN 47408, U.S.A. ajyounge@indiana.edu Abstract—Cloud computing has and will continue to emerge as a fundamental paradigm shift within Distributed Systems. While the features of Clouds are well known and widespread, the ability for users to specifically define and customize their environment is paramount to the advancement of Clouds. As such, there is an ever-growing need to leverage the ability of Clouds to provide a comprehensive experiment and workflow management framework to the scientific community. In this paper, such a novel framework is illustrated and discussed in detail, specifically targeting deployment with the FutureGrid project, a scientific Cloud testbed. I. I NTRODUCTION Experimentation, one of the most prized and coveted meth- ods within science, is carried out using the scientific method to answer a question or investigate a specific problem. As in many scientific disciplines, the notion of an experiment in this context is to contain one or more hypotheses that are supported by the experiment or disprove the hypothesis. Experiments may also include an apparatus that is used to conduct the experiment. Proper recording of these activities not only allows the reproducibility of the experiment, but also the sharing of results within an interest group or community. Moreover, an experiment apparatus can itself be a point of research or activities, that allows the creation of new experiments due to the sheer availability of the apparatus. This is a common model used in scientific discovery. This manuscript looks to provide such analogous scientific instruments for scientists within the context of Cloud computing, specifically with the FutureGrid (FG) testbed envirnment. It is the hope that all activities within FutureGrid will be primarily experiment-based in nature, fitting the model of a true testbed for scientific research. These activities will be driven by steps that can be together classified as an experiment. Experiments naturally vary in complexity. They may include basic experiments, such as utilizing a particular pre-installed service and allowing a researcher to debug an application interactively. They may also include more sophisticated ex- periments, such as instantiating a particular environment and running a pre-specified set of tasks on the environment. Furthremore, experiments may invole a non-trivial number of steps or execution paths to accurately test a given hypothesis. It is the vision that a direct outcome of having such a experiment- centric approach will be the creation of a collection of software images and experimental data that provides a reusable resource for application and computational sciences. FutureGrid can thus enable Grid and Cloud researchers to conveniently define, execute, and repeat application or middleware experiments within interacting software “stacks” that are under the control of the experimenter. It will also allow researchers to leverage from previous experiences of other experimenters in setting up and configuring experiments, hence creating a community of users. FutureGrid will support these pre-configured experiment environments with explicit default settings so that researchers can quickly select an appropriate pre-configured environment and use it in their specific scenario. In order to support the scientific users of FutureGrid, there are some functional requirements that are needed. These include but are not limited to the following functions that follow roughly a basic execution plan: Organize projects and experiments. Provide a uniform structure across all experiments. Annotate experiments so they can be cataloged and shared. Annotate what the experiment is about. Annotate which resources are being used and how. Annotate which results are produced by the experiment. Provide information about the nature of the projects and experiments to the FG management. Provide a mechanism in which multiple users can easily collaborate as part of projects or even individual experi- ments. Provision resources to conduct the experiments. Execute an experiment. Monitor the execution of experiments. Record all required information for replication of the experiment. Reproduce the experiment in the same environment at a later time. Meeting these requirements will be a feat in of itself, and would perhaps be intractable without the use of Cloud com- puting environments. The end result ideally provides the basis for bridging the large gab between Infrastructure-as-a-Service and Platforms-as-a-Service within the scientific domain. II. RELATED RESEARCH One of the goals of the FutureGrid project is to understand the behavior and utility of Cloud computing approaches. Recently, Cloud computing has become quite popular and a multitude of middleware has been developed. However, it