Combining RapidMiner operators with bioinformatics services – a powerful combination Simon Jupp 1 , James Eales 1 , Simon Fischer 2 , Sebastian Land 2 Rishi Ramgolam 1 , Alan Williams 1 , Robert Stevens 1 1 School of Computer Science, University of Manchester, UK 2 Rapid-I GmbH, Stockumer Str. 475, 44227 Dortmund, Germany Abstract Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing those data. Any data analysis in molecular biology involves gathering and processing data from many sources, even before the analysis for the central biological question takes place. Taverna is a workflow workbench that allows bioinformaticians to create data pipelines involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform analyses that answer biological questions. RapidMiner brings a large suite of data processing, visualisation and data mining tools to bear upon tables of data, but there is a disconnect between these operators and the services available to users of Taverna. Through a RapidMiner extension to Taverna we have combined the ability to gather and process data from many molecular biological sources with RapidMiner’s data mining capabilities to provide a powerful tool for scientific analysis. In this article we describe this RapidMiner extension to Taverna and some preliminary analyses we have performed using RapidMiner on biological data. 1 Introduction Data in molecular biology are characterised by their complexity, volatility and, more recently, by their large volume. Bioinformatics specialists gather and process these data to find patterns that may form hypotheses for new laboratory experiments [1]. This places data mining activity at the centre