International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013
ISSN 2229-5518
IJSER © 2013
http://www.ijser.org
Analysis of Staphylococcus using comparative
genomics.
Sunil S. Thorat and Prashant V. Thakare
Abstract— These Comparative genomics and genomic tools have been used to identify virulence factors and genes involved in environmental
persistence of pathogens. However, a major stumbling block in the genomics revolution has been the large number of genes with unknown
function that have been identified in every organism sequenced to date. Over 1740 bacterial genome sequences are currently available in public
databases and over 5230 are in progress, representing hundreds of species as well as multiple strains of the same species. The study of these
genomes by both computational and experimental approaches has significantly advanced our understanding of the physiology and pathogenicity
of many microbes and provided insights into the mechanisms and history of genome evolution. Several ‘‘postgenomic’’ methods have been
utilized to identify genes that are essential for bacterial growth or pathogenesis. Here we demonstrate the utility of several DNA and protein
sequence comparison tools to interpret the information obtained from several genome projects. Comparisons are presented between closely
related strains of Staphylococcus aureus and S. epidermidis spp. The comparative genome analysis will generate a wealth of data to compare
pathogenic strains with varying levels of pathogenicity, which in turn may reveal mechanisms by which the pathogen may adapt to a particular
host.
Index Terms— Comparative Genomics, GenePlot, TaxPlot.
—————————— ——————————
1 INTRODUCTION
taphylococci are Gram-positive bacteria which play an
important role in infectious disease [1]. Staphylococcus is
one of the major causes of community-acquired and hospi-
tal-acquired infections. It produces numerous toxins including
superantigens that cause unique disease entities such as toxic-
shock syndrome and staphylococcal scarlet fever, and has ac-
quired resistance to practically all antibiotics [2]. Staphylococcus
aureus and Staphylococcus epidermidis are significant in their
interactions with humans. S. aureus colonizes mainly the nasal
passages, but it may be found regularly in most other ana-
tomical locales. S. epidermidis is an inhabitant of the skin [3].
Staphylococcus aureus and S. epidermidis are major causes of
infection related to biofilm formed on indwelling medical de-
vices. Such infections are common causes of morbidity and
mortality and are difficult to treat because of biofilm resistance
to antibiotics [4].
Advances in automatic DNA sequencing technique and the
whole-genome shotgun strategy have resulted in a
tremendous increase in the amount of available genome data.
These valuable data provided good subjects for experimental
studies and functional analysis [5]. Comparative genomics has
become more and more attractive, especially between two
closely related species [6]. Comparing the genome sequences
will lend insight into the evolution of drug resistance and lead
to the identification of genes that can be targeted by a new
generation of antibiotics [1].
Comparative genomics of encoded proteins with the database
of existing annotated sequences is a useful approach to under-
stand the information at the genome level. Once a genome
sequence is available, a primary goal is to identify functional
regions in the sequences, including genes and regulatory se-
quences. Much of this identification will require new experi-
mental work, but some information can be obtained purely
computationally [7]. Comparative sequence analysis has be-
come a powerful tool regarding a variety of problems ranging
from gene finding to the identification of regulatory elements
[8-11]. Comparative genomics is the analysis and comparison
of genomes from different species. The purpose of compara-
tive genomics is to gain a better understanding of how species
have evolved and to determine the function of genes and non
coding regions of the genomes. Genome researchers look at
many different features when comparing genomes: sequence
similarity, gene location, the length and number of coding
regions (exons) within genes, the amount of non coding DNA
in each genome, and highly conserved regions maintained in
organisms as simple as bacteria and as complex as humans.
Genomic comparisons performed to find genes among closely
related pathogens that differ in their host ranges have yielded
contrasting data [12]. Though whole genome analysis of
Staphylococcus aureus and genome-based analysis of virulence
genes in Staphylococcus epidermidis has been reported, com-
parative genome analysis of all the available Staphylococcus
species may be necessary step toward future development of
countermeasure against this organism.
2 METHODOLOGY
2.1 Extraction of genomic data
The source of genomic data was NCBI Reference Sequence
collection [13], available on NCBI ftp server
(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ ) as individual files in
GenBank format. The data was gathered and compiled for all
the selected genomes in order to retrieve information as
homogeneous and consistent. The complete genome entry files
were stored in local database in a proper format to retrieve
various attributes of genome features for genomic studies. The
information was stored as structural data together with
complete information of each genome for manual examination
and data curation. Once the batch of initial data was loaded in
the local database, it was enriched with additional information
retrieved from public repositories to the functional annotation
of proteins coded by each genome.
S
1775
IJSER