Identifying Novel Peroxisomal Proteins John Hawkins 1,2 Donna Mahony 3 Stefan Maetschke 1,2 Mark Wakabayashi 1,2 Rohan D Teasdale 3 Mikael Bod´ en 2 December 18, 2006 1 ARC Centre for Complex Systems. 2 School of Information Technology and Electrical Engineering 3 Institute for Molecular Bioscience and ARC Centre in Bioinformatics. The University of Queensland, St Lucia, Queensland 4072, Australia. Contact: John Hawkins Email: jhawkins@itee.uq.edu.au PH: +61 7 3365 1636 FAX: +61 7 3365 4999 Abstract Peroxisomes are small subcellular compartments responsible for a range of essential metabolic processes. Efforts in predicting peroxisomal protein import are challenged by species variation and sparse sequence data sets with experimentally confirmed localization. We present a predictor of peroxisomal import based on the presence of the dominant peroxisomal targeting signal one (PTS1), a seemingly well-conserved but highly unspecific motif. The signal appears to rely on subtle dependencies with the preceding residues. We evaluate prediction accuracies against two alternative predictor services, PeroxiP and the Pts1 Predictor. We test the integrity of prediction on a range of prokaryotic and eukary- otic proteomes lacking peroxisomes. Similarly we test the accuracy on peroxisomal proteins known to not overlap with training data. The model identified a number of proteins within the RIKEN IPS7 mouse protein dataset, as potentially novel peroxisomal proteins. Three were confirmed in vitro using immunofluorescent detection of myc-epitope-tagged proteins in transiently transfected BHK-21 cells (Dhrs2, Serhl and Ehhadh). The final model has a superior specificity to both alternatives, and an accuracy better than PeroxiP and on par with Pts1 Predictor. Thus, the model we present should prove invaluable for labeling PTS1 targeted proteins with high confidence. We use the predictor to screen several additional eukaryotic genomes to revise previously estimated numbers of peroxisomal proteins. Available at http://pprowler.itee.uq.edu.au. 1 Introduction Peroxisomes are relatively small compartments that are particularly abundant in liver cells and neurons. In plants they occur in large numbers as metabolically specialized microbodies. The peroxisome plays an important role in lipid, ethanol and glyoxylate metabolism and detoxification of reactive oxygen species. They are believed to be essential for coping with oxidative stress. Peroxisomal disorders often involve abnormal accumulation of long fatty acids which subsequently impacts membrane structure [34]. The mechanisms of protein localization are one potential avenue to be explored for disease-control [21]. There are numerous mechanisms of subcellular localization, depending on the organelle and the function of the protein. In general they all rely on the existence of some form of targeting signal, 1