Association of variations in I kappa B-epsilon with Graves’ disease using classical and my Grid methodologies Peter Li 1 , Keith Hayward 1 , Claire Jennings 2 , Kate Owen 2 , Tom Oinn 3 , Robert Stevens 4 , Simon Pearce 2 and Anil Wipat 1 1 School of Computing Science, University of Newcastle upon Tyne, NE1 7RU, 2 Institute of Human Genetics, University of Newcastle upon Tyne, International Centre for Life, NE1 3BZ, 3 European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, 4 Department of Computer Science, University of Manchester, M13 9PL. Abstract Bioinformatics experiments can be modelled as workflows whereby the order of each computational resource used has been pre-defined. Workflows in the my Grid project are composed and enacted using the Taverna workflow system. We have compared the use of Taverna with classical approaches for performing bioinformatics experiments in the genetic analysis of Graves’ disease. Both classical and myGrid methodologies identified I kappa B-epsilon as a candidate gene involved in Graves’ disease, demonstrating that my Grid is capable of producing the same results as the classical bioinformatics approach. Introduction Bioinformatics analyses are in silico experiments involving the use of local and remote resources to test a hypothesis, derive a summary or search for patterns (Stevens et al., 2003a). These resources may be information repositories such as the EMBL (Kulikova et al., 2004) and Swiss-Prot (Boeckmann et al., 2003) databases, or computational analysis tools like BLAST (Altschul et al., 1990) and ClustalW (Higgins et al., 1994). The analysis performed in an in silico experiment frequently involves a combination of these resources that each perform a task. Each of these tasks are linked in a specific order to form a workflow process. For example, a workflow to investigate the evolutionary relationships between proteins might begin with acquiring amino acid sequences belonging to a protein family from Swiss-Prot and then applying the ClustalW algorithm to align and identify patterns between sequences. Organisations have begun to provide programmatic access to bioinformatics information repositories and analysis tools based on Web Services (Stein, 2002), a new distributed computing architecture which uses existing Internet communication and data exchange standards (Booth et al., 2003). Resources with Web Service access provide a web-based, published, application programming interface for interaction with other applications. Examples of bioinformatics Web Services include the XEMBL (Wang et al., 2002), openBQS (Senger, 2002) and Soaplab analysis services (Senger et al., 2003) hosted by the European Bioinformatics Institute (EBI), the services provided by XML Central of DDBJ (Miyazaki and Sugawara, 2000), the KEGG API (Kawashima et al., 2003) and a range of analysis services offered by the PathPort project (Eckart and Sobral, 2003). The my Grid e-Science project aims to provide high-level, service-based middleware to support data-intensive in silico bioinformatics experiments using distributed resources (Goble et al., 2003; Stevens et al., 2003b). These bioinformatics analyses depend on a workflow system which can converse with the interfaces of Web Services and mediate how data flows between resources. This led to the inception of the Taverna project within my Grid which has developed an open source workflow tool enabling scientists to orchestrate bioinformatics Web Services and existing bioinformatics applications in workflows. We have used the Taverna workflow system to build and enact workflows which model the in silico analyses undertaken for the genetic analysis of Graves’ disease (GD) (Imrie et al., 2001), and compared the performance of this new methodology with the classical bioinformatics approach. Taverna workflow system The emphasis taken in the Taverna project has been to provide working tools for e-Scientists to perform their in silico experiments. The Taverna software is available as open source and can be downloaded at http://taverna.sourceforge.net/.