Proteomic Signatures: Amino Acid and Oligopeptide
Compositions Differentiate Among Phyla
Itsik Pe’er,
1¶
Clifford E. Felder,
2
Orna Man,
1,2
Israel Silman,
3§
Joel L. Sussman,
2†
*
and Jacques S. Beckmann
1‡
*
1
Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
2
Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
3
Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
ABSTRACT Availability of complete genome
sequences allows in-depth comparison of single-
residue and oligopeptide compositions of the corre-
sponding proteomes. We have used principal compo-
nent analysis (PCA) to study the landscape of
compositional motifs across more than 70 genera
from all three superkingdoms. Unexpectedly, the
first two principal components clearly differentiate
archaea, eubacteria, and eukaryota from each other.
In particular, we contrast compositional patterns
typical of the three superkingdoms and character-
ize differences between species and phyla, as well as
among patterns shared by all compositional pro-
teomic signatures. These species-specific patterns
may even extend to subsets of the entire proteome,
such as proteins pertaining to individual yeast chro-
mosomes. We identify factors that affect composi-
tional signatures, such as living habitat, and detect
strong eukaryotic preference for homopeptides and
palindromic tripeptides. We further detect oligopep-
tides that are either universally over- or underabun-
dant across the whole proteomic landscape, as well
as oligopeptides whose over- or underabundance is
phylum- or species-specific. Finally, we report that
species composition signatures preserve evolution-
ary memory, providing a new method to compare
phylogenetic relationships among species that
avoids problems of sequence alignment and or-
tholog detection. Proteins 2004;54:20 – 40.
© 2003 Wiley-Liss, Inc.
Key words: phylogenetics; principal component
analysis; proteome composition
INTRODUCTION
Over 100 genomes have been fully sequenced to date,
providing an opportunity for comprehensive comparison
and analysis of their organization, similarity, uniqueness,
and variability at the sequence level. Comparative analy-
sis of the proteomes derived from these genomes has
already proven powerful in gene identification and in
prediction of structure, function, and active sites of pro-
teins, as well as in phylogenetic analysis. These analyses
are usually based on a per-sequence comparison (e.g., see
Gribaldo and Philippe
1
). However, such studies suffer
from a major difficulty imposed by the requirement for
orthologs of the analyzed proteins from all compared
species. Even when orthologs are present, their detection
often is prone to error. Furthermore, the resemblance of a
specific (or putative) protein may not be representative of
species relatedness because of ancestral gene duplication,
pseudogenization, or lateral gene transfer (LGT). Finally,
the success of such comparative analyses greatly depends
on the quality of the sequence alignment, which is hard to
control automatically for large data sets.
With available complete-genome data sets, one can
pursue complementary per proteome approaches and ad-
dress general, global properties. In contrast to gene-based
approaches, whole-proteome analyses can be performed in
the absence of any ortholog knowledge of the encoded
products. Indeed, such approaches can be powerful, as
illustrated by recent studies.
2–5
Prominent approaches of this type consider high-level
organization, such as chromosomal gene order and compo-
Grant sponsor: Israel Ministry of Science and Technology Grant for
Interdisciplinary Studies
Grant sponsor: The Israel Ministry of Science and Technology grant
for the Israel Structural Proteomics Center.
Grant sponsor: European Commission Fifth Framework “Quality of
Life and Management of Living Resources” program under contract
number: QLK3-2000-00650
Grant sponsor: European Commission Fifth Framework “Quality of
Life and Management of Living Resources” ‘SPINE’ Project; Grant
number: QLG2-CT-2002-00988
Grant sponsor: Helen & Milton A. Kimmelman Center for Biomolec-
ular Structure and Assembly (Rehovot, Israel)
Grant sponsor: Benoziyo Center for Neurosciences (Rehovot, Israel).
Grant sponsor: Kalman and Ida Wolens Foundation (Rehovot,
Israel).
Grant sponsor: Jean and Jula Goldwurm Memorial Foundation
(Rehovot, Israel).
Grant sponsor: Divadol Foundation (Rehovot, Israel).
¶
Itsik Pe’er is a recipient of the ESHKOL fellowship by the Israeli
Ministry of Science and Technology.
§
Israel Silman is the Bernstein–Mason Professor of Neurochemis-
try.
†Joel L. Sussman is the Morton and Gladys Pickman Professor of
Structural Biology.
‡Jacques S. Beckmann is the Hermann Mayer Professor of Molecu-
lar Genetics.
*Correspondence to: Joel L. Sussman, Department of Structural
Biology, Weizmann Institute of Science, Rehovot, 76100 Israel. E-
mail: joel.sussman@weizmann.ac.il, or Jacques S. Beckmann, Depart-
ment of Molecular Genetics, Weizmann Institute of Science, Rehovot,
76100 Israel. E-mail: jacqui.beckmann@weizmann.ac.il
Received 8 April 2003; Accepted 17 June 2003
PROTEINS: Structure, Function, and Bioinformatics 54:20 – 40 (2004)
© 2003 WILEY-LISS, INC.