RESEARCH ARTICLE Automated reporting from gel-based proteomics experiments using the open source Proteios database application Fredrik Levander 1 , Morten Krogh 2 , Kristofer Wårell 1 , Per Gärdén 2 , Peter James 1 and Jari Häkkinen 2 1 Protein Technology, Lund University, BMC D13, Lund, Sweden 2 Department of Theoretical Physics, Lund University, Lund, Sweden The assembly of data from different parts of proteomics workflow is often a major bottleneck in proteomics. Furthermore, there is an increasing demand for the publication of details about protein identifications due to the problems with false-positive and false-negative identifications. In this report, we describe how the open-source Proteios software has been expanded to automate the assembly of the different parts of a gel-based proteomics workflow. In Proteios it is possible to generate protein identification reports that contain all the information currently required by proteomics journals. It is also possible for the user to specify maximum allowed false positive ratios, and reports are automatically generated with the corresponding score cut-offs calculated. When protein identification is conducted using multiple search engines, the score thresholds that correlate to the predetermined error rate are also explicitly calculated for proteins that appear on the result lists of more than one search engine. Received: October 24, 2006 Revised: December 4, 2006 Accepted: December 5, 2006 Keywords: 2-D PAGE / Protein identification / Reporting 668 Proteomics 2007, 7, 668–674 1 Introduction The diversity of experimental setups for proteomics is greater than ever, and vast amounts of data in different for- mats are generated, no matter which experimental workflow is used. In the classic 2-DE-based workflow, several steps are involved, each generating data with different formats. Gel analysis programs produce analysis data in one format, spot pick lists in another format, spot processing equipment pro- duce log files in their format, the mass spectrometers will return raw data in formats specific to the vendors, and the processed peak lists can be displayed in a variety of formats. Finally, protein identification search engines return a variety of different report formats. Even though the experimental work is fast and at least partly automated, the process of putting together all data usually requires a lot of hands-on work. Now, there is an effort within the HUPO to standardise the data formats (PSIDEV, http://psidev.sourceforge.net), but until standards emerge the proteomics researcher will have to deal with all the different kinds of data formats. Proteomics experiments, and protein identification in particular, are complex processes and the statistical relevance of results can be hard to assess. Therefore, a minimum amount of information about protein identification has been set as a requirement for publication in the major proteomics journals ([1], http://www.mcponline.org/misc/ParisReport_ final.shtml). In order to adequately report MS and MS/MS search results, a lot of information is required to be assem- bled, which is very time-consuming and tedious when done manually. Furthermore, the determination of false-positive ratios for identifications requires additional work. Correspondence: Dr. Fredrik Levander, Protein Technology, Lund University, BMC D13, 22184 Lund, Sweden E-mail: Fredrik.Levander@elmat.lth.se Fax: 146-46-222-1495 Abbreviation: LSIDs, life science identifiers DOI 10.1002/pmic.200600814 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com