Research Focus PlasmoPredict: a gene function prediction website for Plasmodium falciparum Philip M.R. Tedder 1 , James R. Bradford 2 , Glenn A. McConkey 3 , Andy J. Bulpitt 4 and David R. Westhead 1 1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK 2 Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester M20 4BX, UK 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds LS2 9JT, UK 4 School of Computing, University of Leeds, Leeds LS2 9JT, UK The genome sequence of the malaria parasite Plasmo- dium falciparum was published in 2002 and revealed that 60% of its genes could not be assigned a function. Eight years later the majority of P. falciparum proteins are still of unknown function. We therefore present PlasmoPre- dict, an easy-to-use online gene function prediction tool that integrates a wide range of functional genomics data for P. falciparum to aid in the annotation of these genes. Plasmodium genome annotation: elucidating ‘the unknome’ Plasmodium falciparum, among the five malaria parasite species that infect humans, is responsible for the majority of the mortality caused by malaria [1]. The sequencing of the P. falciparum genome, completed in 2002, revealed that 60% of its genes did not show enough similarity to genes of known function for a function to be assigned [2]. Efforts have been made to increase the number of genes that can be assigned a function, yet more than 50% of the genes remain unannotated. For many of the annotated genes the annotation is often either not very specific (e.g. membrane protein) or incom- plete (e.g. cellular location known but of unknown mol- ecular function). Since the sequencing of the genome several functional genomics datasets have been pub- lished for P. falciparum [3]. However, other databases that already include a substantial proportion of these data and can be used to predict the function of unanno- tated genes are limited in number and in scope. Plas- moDB [4] is the most extensive website resource for Plasmodium, and includes large amounts of functional genomics data but, although it provides an invaluable resource of genomic information, the display and analysis of the data provided by the programme are not focused on the prediction of gene function. Plasmodraft [5] is a database that produces predictions for the function of P. falciparum genes, but Plasmodraft only uses three non-homology-based data sources. Therefore we present PlasmoPredict (www.bioinformatics.leeds.ac. uk/bio5pmrt/PlasmoPredict/PlasmoPredict.html), a web- accessible database resource that allows detailed bioinfor- matic investigation of the wide selection of functional genomic data available for P. falciparum, in the context of predicting likely gene functions and in a form convenient to laboratory researchers in parasitology. PlasmoPredict PlasmoPredict takes as input a P. falciparum gene ID and outputs several lists of other P. falciparum genes likely to be functionally related to that gene. Each list is derived from one of nine different datasets divided into three types according to their source: (i) sequence and/or structure, (ii) genomic, and (iii) high-throughput. Every member of a list is linked to the query gene by a common property (e.g. a shared domain) or behaviour (e.g. similar expression pat- tern) determined by a method specific to each dataset (see later). The underlying principle of PlasmoPredict is ‘guilt by association’, whereby the functions of the genes within a list are assumed to indicate the function of the query gene. This distinguishes PlasmoPredict from other Plasmodium databases (such as PlasmoDB) where the user is only shown the data for the specific gene and therefore the user is restricted to inferring the function of the gene solely from the properties of that gene. To aid analysis, PlasmoPredict displays the Gene Ontology [6] (GO) annotations of all P. falciparum genes, and this allows three different aspects of function to be predicted: (i) molecular function, (ii) bio- logical process, and (iii) subcellular location. The results for all methods selected can also be viewed in an applet based on the network visualization program Medusa [7]. In this network the nodes are the genes and the edges are the methods, with each edge being assigned a confidence value based on the probability of each gene sharing the same function for that GO class (i.e. molecular function, bio- logical process or subcellular location). This confidence value is calculated using the existing GO annotations for P. falciparum and a Bayesian classifier method. PlasmoPredict provides several features to aid its usage. Gene function prediction is not limited to one gene, and it is possible to produce the results for several genes at one time. The user can also specify which methods he/she would like to be displayed and can also select the parameters (for example the number of correlated genes), with the results for all the methods being available to download at the bottom of the page as a Cytoscape simple interaction format (sif) file. If the query gene ID is unknown to the user, a search utility will locate the ID Corresponding author: Westhead, D.R. (D.R.Westhead@leeds.ac.uk). Update Trends in Parasitology Vol.26 No.3 107