From the desktop to the grid: conversion of KNIME Workflows to gUSE Luis de la Garza Applied Bioinformatics Group University of T¨ ubingen, Germany delagarza@informatik.uni-tuebingen.de Jens Kr¨ uger Applied Bioinformatics Group University of T¨ ubingen, Germany Charlotta Sch¨ arfe Applied Bioinformatics Group University of T¨ ubingen, Germany Marc R¨ ottig Applied Bioinformatics Group University of T¨ ubingen, Germany Stephan Aiche Department of Mathematics and Computer Science Freie Universit¨ at Berlin, Germany International Max Planck Research School for Computational Biology and Scientific Computing Berlin, Germany Knut Reinert Algorithms in Bioinformatics Freie Universit¨ at Berlin, Germany Oliver Kohlbacher Applied Bioinformatics Group University of T¨ ubingen, Germany oliver.kohlbacher@uni-tuebingen.de Abstract—The Konstanz Information Miner is a user-friendly graphical workflow designer with a broad user base in industry and academia. Its broad range of embedded tools and its powerful data mining and visualization tools render it ideal for scientific workflows. It is thus used more and more in a broad range of applications. However, the free version typically runs on a desktop computer, restricting users if they want to tap into computing power. The grid and cloud User Support Environment is a free and open source project created for parallelized and distributed systems, but the creation of workflows with the included components has a steeper learning curve. In this work we suggest an easy to implement solution combining the ease-of-use of the Konstanz Information Miner with the computational power of distributed computing infras- tructures. We present a solution permitting the conversion of workflows between the two platforms. This enables a convenient development, debugging, and maintenance of scientific workflows on the desktop. These workflows can then be deployed on a cloud or grid, thus permitting large-scale computation. To achieve our goals, we relied on a Common Tool Description XML file format which describes the execution of arbitrary programs in a structured and easily readable and parseable way. In order to integrate external programs into we employed the Generic KNIME Nodes extension. I. I NTRODUCTION Workflow technology with platforms such as Pipeline Pilot [1], KNIME [2], Taverna [3], [4], [5] and Galaxy [6], [7], [8] have now become a crucial part in supporting scientists in their daily work. By helping to create and automate virtual processes such as molecular docking or molecular dynamics simulations, as well as simplifying data analysis and data mining, scientists are allowed to focus on their primary goals [9]. Furthermore the quality of simulation results is improved, as following established protocols increases reproducibility in the sense of good lab practice. The most obvious and direct advantage of the application of workflows in the scientific environment is the capability of saving the general sequence of events in order to conveniently optimize the settings for a simulation, such as including the sweep through single parameter settings. Scientists also benefit from other non-obvious advantages of using workflows including, but not limited to: ability to analyze the results, including statistical analysis and data visualization, data mining on experimentally (wet or dry lab) obtained datasets and report creation using previously obtained data without requiring further user input. Those tasks can also be fulfilled using simple scripts or separate program suites for the individual steps. Workflow technology however allows combining all steps together by providing interfaces to external tools while not requiring any knowledge of programming or scripting languages. Additionally, the workflows established within one project may be easily applied to other projects as well, which then facilitates consistency in analysis and reporting throughout several projects, thus reducing the risk of human error and allowing reproducing previous results. Furthermore, through the ability to share workflows with collaborators or the scientific community a team-based analysis of experimental results can take place. Nowadays a plethora of different workflow systems exists that was initially targeted at different use cases such as desktop- based data mining or automation of computations on a grid.