J Comput Sci Syst Biol Volume 2(6): 298-299 (2009) - 298 ISSN:0974-7230 JCSB, an open access journal Research Article OPEN ACCESS Freely available online doi:10.4172/jcsb.1000045           Rajneesh Kumar Gaur Bioinformatics Infrastructure Facility, Jamia Hamdard (Hamdard University), Hamdard Nagar, New Delhi, India – 110062 Abstract Proteins constitute the important constituent of the cel- lular machinery. The comparative analysis of non-mem- brane proteins (nMPs) between prokaryotes and eukary- otes carried out to determine the biasedness in amino acid distribution. On comparison, the results revealed that ‘Ala’ is the dominant amino acid in prokaryotic nMPs while ‘Lys, Ser and Cys’ are the dominant amino acids in eu- karyotic nMPs. Journal of Computer Science & Systems Biology - Open Access JCSB/Vol.2 November-December 2009 *Corresponding author: Rajneesh Kumar Gaur, Bioinformatics Infrastructure Facility, Jamia Hamdard (Hamdard University), Hamdard Nagar, New Delhi, India – 110062, Tel: +91 9990290384; E-mail: meetgaur@gmail.com Received September 30, 2009; Accepted December 27, 2009; Pub- lished December 27, 2009 Citation: Gaur RK (2009) Prokaryotic and Eukaryotic Non-membrane Proteins have Biased Amino Acid Distribution. J Comput Sci Syst Biol 2: 298-299. doi:10.4172/jcsb.1000045 Copyright: © 2009 Gaur RK. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Keywords: Non-membrane proteins; Amino acid composition; Prokaryotes; Eukaryotes Abbreviations: MPs: Membrane Proteins; nMPs: non-mem- brane proteins Introduction Proteins constitute about 50% of the dry weight of most cells and are the most structurally complex macromolecules known. Proteins can be classified in different manner but for the pur- pose of this study we classified them as membrane (part of ei- ther cellular or organelle membrane; MPs) and non-membrane (located outside the membrane; nMPs) proteins. Amino acids are the building block of a protein and their composition deter- mines the overall properties and stability of a protein. Many pre- vious studies have shown how amino acid composition can be successfully applied to protein sequence analysis, including pre- diction of structural class (Zhang et al., 1992), discrimination of intra- and extra cellular proteins (Nakashima et al., 1994), pre- diction of sub-cellular location (Cedano et al., 1997). It was sug- gested that composition differences are a consequence of differ- ent requirements for protein folding, stability and transportation. The recent increase in the number of whole genome sequences has made the analysis of the corresponding proteomes possible. So far the amino acid composition of both the prokaryotic and eukaryotic proteomic databases have been explored separately for different purposes such as determination of sequence length (Gerstein, 1998a), identification of conserved sequences (Sobolevsky et al., 2005); elucidation of simple sequences (Subramanyam et al., 2006) etc. However, till now the compara- tive analysis of their non-membrane proteins (nMPs) have not been carried out to determine the overall amino acid composi- tional differences. This computational study is performed to de- velop the amino acid distribution of proteins as a tool to identify the proteins frequently undergo mutations and largely respon- sible for the pathogenicity of the organism. Methodology The dataset was curated manually from the sequences extracted from PSORT (Rey et al., 2005), eSLDB (Pierleoni et al., 2007) and RefSeq (Pruittet et al., 2005) databases. Only the experi- mentally annotated entries were extracted from PSORT data- base. From the RefSeq database, we used microbial (microbial1.protein.faa.gz; 05/11/2009) and eukaryotic (vertebrate_mammalian1.protein.faa.gz; 05/11/2009 & vertebrate_other1.protein.faa.gz; 05/10/2009) sequence release files for construction of the experimental dataset. Protein se- quences flagged as putative, hypothetical, potential, uncharacterized, similar to the predicted protein, membrane, porin, receptor are deleted from the initially downloaded RefSeq sequence release files in the preparation of experimental dataset. The prokaryotic sequence dataset was created by merging the sequence entries from PSORT db and refseq dataset after appro- priate deletions. Similarly, the eukaryotic dataset was prepared after deleting and merging the sequence entries from eSLDB and refseq dataset. The entire dataset used for computing the composition of 20 amino acid residues comprised of prokaryotic (63644) and eu- karyotic (88400) nMP sequences. The amino acid composition for the prepared datasets was computed using the number of amino acids of each type and the total number of residues. It is defined as Residue composition (%) (r) = n r /N X100 (1) where ‘r’ stands for any one of the 20 amino acid residue. n r is the total number of residue of each type and N is the total number of residues in the dataset. Results and Discussion The amino acid compositional distribution between prokary- otic and eukaryotic nMPs was computed using eq. (1). The prokaryotic nMPs shows the dominant occurrence of a non-po- lar amino acid ‘Ala’ (= 0.45) while the eukaryotic nMPs pre- dominantly possess the polar amino acids ‘Lys’ (= 0.66), ‘Ser’ (= 0.60) and ‘Cys’ (= 0.29) (Figure 1). In prokaryotic nMPs, the high frequency of short side-chained non-polar aliphatic amino acid ‘Ala’ may be due to various possibilities such as its over-representation in highly expressed proteins (Tats et al., 2006), its role in determining the cleavage of N-terminal formyl methionine (Solbiati et al., 1999), its role in assisting the en- trance of the nascent peptide chain into the ribosomal tunnel (Tenson et al., 2002) and in helix–helix packing (Eyre et al.,