J Comput Sci Syst Biol Volume 2(6): 298-299 (2009) - 298
ISSN:0974-7230 JCSB, an open access journal
Research Article OPEN ACCESS Freely available online doi:10.4172/jcsb.1000045
Rajneesh Kumar Gaur
Bioinformatics Infrastructure Facility, Jamia Hamdard (Hamdard University), Hamdard Nagar, New Delhi, India – 110062
Abstract
Proteins constitute the important constituent of the cel-
lular machinery. The comparative analysis of non-mem-
brane proteins (nMPs) between prokaryotes and eukary-
otes carried out to determine the biasedness in amino acid
distribution. On comparison, the results revealed that ‘Ala’
is the dominant amino acid in prokaryotic nMPs while
‘Lys, Ser and Cys’ are the dominant amino acids in eu-
karyotic nMPs.
Journal of Computer Science & Systems Biology - Open Access
JCSB/Vol.2 November-December 2009
*Corresponding author: Rajneesh Kumar Gaur, Bioinformatics
Infrastructure Facility, Jamia Hamdard (Hamdard University), Hamdard
Nagar, New Delhi, India – 110062, Tel: +91 9990290384; E-mail:
meetgaur@gmail.com
Received September 30, 2009; Accepted December 27, 2009; Pub-
lished December 27, 2009
Citation: Gaur RK (2009) Prokaryotic and Eukaryotic Non-membrane
Proteins have Biased Amino Acid Distribution. J Comput Sci Syst Biol 2:
298-299. doi:10.4172/jcsb.1000045
Copyright: © 2009 Gaur RK. This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium,
provided the original author and source are credited.
Keywords: Non-membrane proteins; Amino acid composition;
Prokaryotes; Eukaryotes
Abbreviations: MPs: Membrane Proteins; nMPs: non-mem-
brane proteins
Introduction
Proteins constitute about 50% of the dry weight of most cells
and are the most structurally complex macromolecules known.
Proteins can be classified in different manner but for the pur-
pose of this study we classified them as membrane (part of ei-
ther cellular or organelle membrane; MPs) and non-membrane
(located outside the membrane; nMPs) proteins. Amino acids
are the building block of a protein and their composition deter-
mines the overall properties and stability of a protein. Many pre-
vious studies have shown how amino acid composition can be
successfully applied to protein sequence analysis, including pre-
diction of structural class (Zhang et al., 1992), discrimination of
intra- and extra cellular proteins (Nakashima et al., 1994), pre-
diction of sub-cellular location (Cedano et al., 1997). It was sug-
gested that composition differences are a consequence of differ-
ent requirements for protein folding, stability and transportation.
The recent increase in the number of whole genome sequences
has made the analysis of the corresponding proteomes possible.
So far the amino acid composition of both the prokaryotic and
eukaryotic proteomic databases have been explored separately
for different purposes such as determination of sequence length
(Gerstein, 1998a), identification of conserved sequences
(Sobolevsky et al., 2005); elucidation of simple sequences
(Subramanyam et al., 2006) etc. However, till now the compara-
tive analysis of their non-membrane proteins (nMPs) have not
been carried out to determine the overall amino acid composi-
tional differences. This computational study is performed to de-
velop the amino acid distribution of proteins as a tool to identify
the proteins frequently undergo mutations and largely respon-
sible for the pathogenicity of the organism.
Methodology
The dataset was curated manually from the sequences extracted
from PSORT (Rey et al., 2005), eSLDB (Pierleoni et al., 2007)
and RefSeq (Pruittet et al., 2005) databases. Only the experi-
mentally annotated entries were extracted from PSORT data-
base. From the RefSeq database, we used microbial
(microbial1.protein.faa.gz; 05/11/2009) and eukaryotic
(vertebrate_mammalian1.protein.faa.gz; 05/11/2009 &
vertebrate_other1.protein.faa.gz; 05/10/2009) sequence release
files for construction of the experimental dataset. Protein se-
quences flagged as putative, hypothetical, potential,
uncharacterized, similar to the predicted protein, membrane,
porin, receptor are deleted from the initially downloaded RefSeq
sequence release files in the preparation of experimental dataset.
The prokaryotic sequence dataset was created by merging the
sequence entries from PSORT db and refseq dataset after appro-
priate deletions. Similarly, the eukaryotic dataset was prepared
after deleting and merging the sequence entries from eSLDB
and refseq dataset.
The entire dataset used for computing the composition of 20
amino acid residues comprised of prokaryotic (63644) and eu-
karyotic (88400) nMP sequences. The amino acid composition
for the prepared datasets was computed using the number of
amino acids of each type and the total number of residues. It is
defined as Residue composition (%) (r) = n
r
/N X100 (1) where
‘r’ stands for any one of the 20 amino acid residue. n
r
is the
total number of residue of each type and N is the total number of
residues in the dataset.
Results and Discussion
The amino acid compositional distribution between prokary-
otic and eukaryotic nMPs was computed using eq. (1). The
prokaryotic nMPs shows the dominant occurrence of a non-po-
lar amino acid ‘Ala’ ( = 0.45) while the eukaryotic nMPs pre-
dominantly possess the polar amino acids ‘Lys’ ( = 0.66), ‘Ser’
( = 0.60) and ‘Cys’ ( = 0.29) (Figure 1). In prokaryotic nMPs,
the high frequency of short side-chained non-polar aliphatic
amino acid ‘Ala’ may be due to various possibilities such as its
over-representation in highly expressed proteins (Tats et al.,
2006), its role in determining the cleavage of N-terminal formyl
methionine (Solbiati et al., 1999), its role in assisting the en-
trance of the nascent peptide chain into the ribosomal tunnel
(Tenson et al., 2002) and in helix–helix packing (Eyre et al.,