Merlin : Metabolic Models Reconstruction
using Genome-Scale Information
⋆
Oscar Dias
*
Miguel Rocha
**
Eug´ enio C. Ferreira
*
Isabel Rocha
*
*
IBB Institute for Biotechnology and Bioengineering, Centre of
Biological Engineering, University of Minho, Campus de Gualtar,
4710-057 Braga, Portugal (e-mail: [odias, ecferreira, irocha]@
deb.uminho.pt).
**
CCTC -Computer Science and Technology Centre, University of
Minho, Campus de Gualtar, 4710-057 Braga, Portugal (e-mail:
mrocha@ di.uminho.pt)
Abstract: This article describes Merlin, a user-friendly program that performs functional
genomic annotations of lists of genes. Merlin retrieves information of each homologue and
automatically scores the results, allowing the user to change the score selection, and dynamically
(re-)annotate the genome. Merlin expedites the transition from genome-scale data to SBML
metabolic models, allowing the user to have a preliminary view of the biochemical network.
Keywords: Systems Biology, Genome-Scale Reconstruction, BLAST, SBML, Metabolic
Engineering.
1. INTRODUCTION
Genome-scale reconstructed metabolic models are based
on the well-known stoichiometry of biochemical reactions
and can be used for simulating in silico the phenotypic
behaviour of a microorganism, under different environmen-
tal and genetic conditions, thus representing an impor-
tant tool in Metabolic Engineering [Rocha et al. (2008)].
The reconstruction of a metabolic network associates the
genome of a given organism to its physiology, through
the replication of the biochemical reactions and molecular
mechanisms taking place in a given organism [Francke
et al. (2005)].
The genome-scale reconstruction of metabolic networks
encompasses several steps, such as genome annotation,
reactions identification and stoichiometry determination,
compartmentation, determination of the biomass composi-
tion, energy requirements and additional constraints. The
first step (genome annotation) is essential to this type of
reconstruction, because precursory data can be extracted
for the model reconstruction. Information such as gene or
open reading frame (ORF) names, assigned cellular func-
tions, sequence similarities, and, for the enzyme coding
genes, the Enzyme Commission (EC) number(s) should be
retrieved to accomplish the first stage of the mathematical
model development [Rocha et al. (2008)].
According to the Integrated Microbial Genomes (IMG)
system [Markowitz et al. (2006)] there are currently more
than 4.000 genomes (4.368 as of December 2009) fully
sequenced with more than 700 genomes (747 as of Decem-
ber 2009) being drafted right now. Sequence similarities
between genes and genomes can be established using well
⋆
This work is supported by a PhD grant from the portuguese
Funda¸c˜aoparaaCiˆ encia e a Tecnologia: SFRH/BD/47307/2008.
known algorithms such as BLAST [Altschul et al. (1990)]
or FASTA [Lipman and Pearson (1985)].
2. GENOME ANNOTATION
Genome Annotation encompasses both ”gene finding”, on
the sequenced genome, and the assignment of biological
functions to the recently found genes [Medigue and Moszer
(2007); Salzberg (2007)].
Gene finding in eukaryotic genomes is different than in the
prokaryotic ones, as about 90% of the bacterial genome
are coding sequences. On the other hand, higher eukary-
otes have less than 10% of coding sequences. Moreover,
eukaryotes generally have two or more overlapping open
reading frames, and it is difficult to identify the start of
translation and find regulatory signals such as promoters
and terminators [Salzberg et al. (1998)].
There are several software tools for gene finding. Almost
all use probabilistic methods, such as Hidden Markov
Models (HMM), to identify coding sequences within the
open reading frames. Examples of such applications are
GLIMMER [Salzberg et al. (1998)], GenMark [Borodovsky
and Mcininch (1993)], EuG` ene [Foissac and Schiex (2005)].
Alternatively, there are some tools that use methods other
than HMM, such as Gismo [Krause et al. (2007)]. A list of
some of these, and some other, applications is available at
www.genefinding.org/software.html.
Some of the software applications listed above also attach
biological data (functional annotation) to the recognised
genes. Other tools that annotate the genome at the pro-
tein level, are GOAnno [Chalmel et al. (2005)], or Gene-
FAS [Joshi et al. (2004)] which uses Bayesian probability
of function similarity between two connected genes and
11th International Symposium on
Computer Applications in Biotechnology
Leuven, Belgium, July 7-9, 2010
978-3-902661-70-8/10/$20.00 © 2010 IFAC
120 10.3182/20100707-3-BE-2012.0076