J.M. Corchado (Eds.): IWPACBB 2008, ASC 49, pp. 92–101, 2009. springerlink.com © Springer-Verlag Berlin Heidelberg 2009 Data Integration Issues in the Reconstruction of the Genome-Scale Metabolic Model of Zymomonas Mobillis José P. Pinto 1 , Oscar Dias 2 , Anália Lourenço 2 , Sónia Carneiro 2 , Eugénio C. Ferreira 2 , Isabel Rocha 2 , and Miguel Rocha 1 1 Departament of Informatics / CCTC 2 IBB - Institute for Biotechnology and Bioengineering, Center of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga - Portugal {josepedr,mrocha}@di.uminho.pt, {odias,analia,soniacarneiro,ecferreira,irocha}@deb.uminho.pt Abstract. Genome-scale model reconstruction represents a major tool in the field of Metabolic Engineering .This paper reports on a study about data integration issues in the process of ge- nome-scale reconstruction of the metabolic model of the bacterium Zymomonas mobilis, a promising organism for bioethanol production. Data is retrieved from the Entrez Gene, KEGG, BioCyc and Brenda databases, and the several processes involved in data integration from these sources are described, as well as the data quality issues. Keywords: Genome-scale model reconstruction, Zymomonas mobillis, data integration, data quality. 1 Introduction Genome-scale reconstructed metabolic models are based on the well-known stoichiome- try of biochemical reactions and can be used for simulating in silico the phenotypic be- haviour of a microorganism under different environmental and genetic conditions, thus representing an important tool in metabolic engineering [1]. However, while the recon- struction of the metabolic network of an organism is likely to become a widespread pro- cedure, starting with the fully sequenced and (partially) annotated genome sequence, it is currently far from being a standardized methodology [2]. This is due in part to the lack of uniform computational tools for model reconstruction, but primarily to the difficulties as- sociated with the extraction of information other than what is available from the anno- tated genome. In this paper, we address the reconstruction of the metabolic model of Zymomonas mobilis ZM4, among the most promising microorganisms for ethanol fuel production [3]. The genome-scale metabolic reconstruction is imperative for the feasibility of on- going studies since there is no available genome-scale metabolic model for this organ- ism. The number of reports in current literature studying its in vivo physiology remains small and there is a limited use of the metabolic engineering experimental and computational tools in the understanding of its metabolic pathway interconnectiv- ity [4]. Therefore, genome-scale metabolic modeling stands out as one of the most promising approaches to obtain in silico predictions of cellular function based on the interaction of the cellular components [5,2].