Using Motif-Based Methods in Multiple Genome Analyses: A Case Study
Comparing Orthologous Mesophilic and Thermophilic Proteins
²
David La,
‡
Melanie Silver,
‡
Robert C. Edgar,
§
and Dennis R. Livesay*
,‡
Department of Chemistry, California State Polytechnic UniVersity at Pomona, 3801 West Temple AVenue,
Pomona, California 91768, and 195 Roque Moraes DriVe, Mill Valley, California 94941
ReceiVed December 31, 2002; ReVised Manuscript ReceiVed May 23, 2003
ABSTRACT: Protein motifs represent highly conserved regions within protein families and are generally
accepted to describe critical regions required for protein stability and/or function. In this comprehensive
analysis, we present a robust, unique approach to identify and compare corresponding mesophilic and
thermophilic sequence motifs between all orthologous proteins within 44 microbial genomes. Motif
similarity is determined through global sequence alignment of mesophilic and thermophilic motif pairs,
which are identified by a greedy algorithm. Our results reveal only modest correlation between motif and
overall sequence similarity, highlighting the rationale of motif-based approaches in comprehensive
multigenome comparisons. Conserved mutations reflect previously suggested physiochemical principles
for conferring thermostability. Additionally, comparisons between corresponding mesophilic and ther-
mophilic motif pairs provide key biochemical insights related to thermostability and can be used to test
the evolutionary robustness of individual structural comparisons. We demonstrate the ability of our unique
approach to provide key insights in two examples: the TATA-box binding protein and glutamate
dehydrogenase families. In the latter example, conserved mutations hint at novel origins leading to structural
stability differences within the hexamer structures. Additionally, we present amino acid composition data
and average protein length comparisons for all 44 microbial genomes.
Proteins that function under standard (mesophilic) condi-
tions tend to have similar structural stabilities, despite having
different sequences and structural folds (1, 2). Several
organisms, mostly archaea, thrive under extreme environ-
mental conditions, e.g., high pressure, high salt concentra-
tions, very high and low temperatures, and extreme pH.
Enzymes that function optimally in such adverse conditions
mediate the metabolic and biological functions of these
organisms. Proteins from thermophilic (extremely high
ambient temperatures) organisms generally exhibit substan-
tially higher intrinsic thermal stabilities than their mesophilic
counterparts while retaining the basic fold characteristics of
the whole family (3).
Although the molecular underpinnings of protein thermal
stabilization have been the focus of many experimental and
theoretical research efforts (for a review see Vielle et al.),
the subject is only partially understood (3, 4). In general, it
is thought that thermostability is achieved by an increase in
the type and numbers of noncovalent interactions (5).
Analyses of all noncovalent interactions within thermophilic
and mesophilic structural pairs reveal that thermophilic
proteins generally have increased numbers of van der Waals
interactions, hydrogen bonds, salt bridges, dipole-dipole
interactions, disulfide bridges, and hydrophobic interactions
(5-18). Other differences include shortening of loop regions,
fewer and smaller destabilizing voids within the protein,
increased structural water content, and increased incidence
of ion binding (16, 19-21). Increased conformational rigidity
of the protein structure and optimization of the surface
electrostatics also appear to parallel thermostability (22-
28). The secondary structure propensity of each amino acid
within R-helices and -sheets has also been demonstrated
to be linked to added stability (29, 30). Despite key
differences between mesophilic and thermophilic structural
pairs, the overall fold and the active site of the protein
generally remain unchanged (31).
To overcome the lack of abundant structural data for
orthologous mesophilic and thermophilic protein pairs,
Chakaravarty et al. have created high quality homology
models taken from 30 complete bacterial genomes (nine of
which are thermophilic) (32). This study identifies several
statistically significant, specific amino acid substitutions,
significantly more salt bridges in thermophiles, a slight
decrease in loop length, and an increase in previously
overlooked cation-π interactions. Additionally, statistically
significant hydrophobic amino acid substitutions are reported
to be consistent with decreased side chain conformational
entropy.
Several studies have concentrated on sequence analysis
to investigate the origins of thermostability. Much of this
work has focused on differences in amino acid composition
between mesophilic and thermophilic genomes. It has been
observed that arginine and tyrosine are significantly more
²
This work was supported by an American Chemical Society
Petroleum Research Fund type G grant (36848-GB4), an NIH score
grant (S06 GM53933), and a supercomputer allocation from the
National Center for Supercomputing Applications to D.R.L.
* Corresponding author. Telephone: 909-869-4409. E-mail:
drlivesay@csupomona.edu.
‡
California State Polytechnic University at Pomona.
§
Roque Moraes Drive, Mill Valley, CA.
8988 Biochemistry 2003, 42, 8988-8998
10.1021/bi027435e CCC: $25.00 © 2003 American Chemical Society
Published on Web 07/09/2003