Workflow Evolution: Tracing Workflows Through Time Eran Chinthaka 1,2,3 , Roger Barga 1,4 , Beth Plale 2,5 , and Nelson Araujo 1,6 1 Microsoft Research, Redmond, Washington 2 School of Informatics and Computing, Indiana University, Bloomington, Indiana . { 3 echintha, 5 plale}@cs.indiana.edu, { 4 barga, 6 nelsona}@microsof t.com Abstract Scientists working on eScience environments use work- flows to carry out their experiments. Since the workflows evolve as the research itself evolves, these workflows are a good tool to track the evolution of the research. Scien- tists can trace their research and associated results through time or even go back in time to a previous stage and fork to a new branch of research. In this paper we introduce the workflow evolution framework (EVF), which is imple- mented in the Trident workflow workbench[5]. The primary contributions of the EVF include i) management of knowl- edge associated with workflow evolution and ii) enabling reproducible research. Since we believe evolution can be used for workflow attribution, our framework will encour- age researchers to share their workflows and get the credit for their contributions. 1. Introduction Computational science experiments often involve a se- quence of activities to be carried out, with a set of config- urable parameters and input data, producing outputs which will be analyzed and evaluated further. Depending on these outputs, scientists will tweak input parameters, input data, and activities of the experiments and even the flow of the experiment, to improve experiment results. If the activities of the experiment or parts of the experiment can be auto- mated, scientists will create workflows to carry out their ex- periments repeatedly in an efficient manner. Especially if those experiments are dependent upon the analysis of mas- sive data sets or demand large computation resources, scien- tific workflows are a better option to use within them. In the workflow scenario, rather than doing everything manually, a scientist will encode their algorithms and experimental pro- cedures as workflows and use the flexibility, tools and fea- tures of scientific workflows. When a workflow framework is used over an extended duration, the research will likely evolve along different dimensions affecting and evolving the associated worklfows(s) as well. After a period of time these scientists may need to review what they have done for a variety of reasons, possibly going back in time weeks or months. Even in operational settings, where workflows are used to produce daily results such as data cleaning and load- ing, these operational workflows will periodically change. We have identified through discussions with users of work- flow systems several reasons why researchers may want to follow the evolution of workflows: • They might want to see the evolution of their research. For example, if they have a better algorithm at this point, they might want to know the path it took to come to the current state and what the previous ver- sions were. • They might even want to go back to a previous stage. May be they want to take their research now in to a different direction and they see the best place to do that is to take the research as it was 6 months back and fork from that point. • Sometimes scientists might discover errors in their al- gorithms or the experiment and want to trace back to the origin of that error. Or they might want to see the data products and results affected by this error. • Scientists might want to visualize the data products their experiments produced over the time and use them for various evaluation purposes. In addition to tracing the workflows over time, scien- tists may also be interested in re-producing workflows. In this research we introduce Workflow Evolution Framework (EVF) to help scientists manage knowledge encoded in their