International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 ISSN 2229-5518 IJSER © 2013 http://www.ijser.org Analysis of Staphylococcus using comparative genomics. Sunil S. Thorat and Prashant V. Thakare Abstract— These Comparative genomics and genomic tools have been used to identify virulence factors and genes involved in environmental persistence of pathogens. However, a major stumbling block in the genomics revolution has been the large number of genes with unknown function that have been identified in every organism sequenced to date. Over 1740 bacterial genome sequences are currently available in public databases and over 5230 are in progress, representing hundreds of species as well as multiple strains of the same species. The study of these genomes by both computational and experimental approaches has significantly advanced our understanding of the physiology and pathogenicity of many microbes and provided insights into the mechanisms and history of genome evolution. Several ‘‘postgenomic’’ methods have been utilized to identify genes that are essential for bacterial growth or pathogenesis. Here we demonstrate the utility of several DNA and protein sequence comparison tools to interpret the information obtained from several genome projects. Comparisons are presented between closely related strains of Staphylococcus aureus and S. epidermidis spp. The comparative genome analysis will generate a wealth of data to compare pathogenic strains with varying levels of pathogenicity, which in turn may reveal mechanisms by which the pathogen may adapt to a particular host. Index Terms— Comparative Genomics, GenePlot, TaxPlot. —————————— —————————— 1 INTRODUCTION taphylococci are Gram-positive bacteria which play an important role in infectious disease [1]. Staphylococcus is one of the major causes of community-acquired and hospi- tal-acquired infections. It produces numerous toxins including superantigens that cause unique disease entities such as toxic- shock syndrome and staphylococcal scarlet fever, and has ac- quired resistance to practically all antibiotics [2]. Staphylococcus aureus and Staphylococcus epidermidis are significant in their interactions with humans. S. aureus colonizes mainly the nasal passages, but it may be found regularly in most other ana- tomical locales. S. epidermidis is an inhabitant of the skin [3]. Staphylococcus aureus and S. epidermidis are major causes of infection related to biofilm formed on indwelling medical de- vices. Such infections are common causes of morbidity and mortality and are difficult to treat because of biofilm resistance to antibiotics [4]. Advances in automatic DNA sequencing technique and the whole-genome shotgun strategy have resulted in a tremendous increase in the amount of available genome data. These valuable data provided good subjects for experimental studies and functional analysis [5]. Comparative genomics has become more and more attractive, especially between two closely related species [6]. Comparing the genome sequences will lend insight into the evolution of drug resistance and lead to the identification of genes that can be targeted by a new generation of antibiotics [1]. Comparative genomics of encoded proteins with the database of existing annotated sequences is a useful approach to under- stand the information at the genome level. Once a genome sequence is available, a primary goal is to identify functional regions in the sequences, including genes and regulatory se- quences. Much of this identification will require new experi- mental work, but some information can be obtained purely computationally [7]. Comparative sequence analysis has be- come a powerful tool regarding a variety of problems ranging from gene finding to the identification of regulatory elements [8-11]. Comparative genomics is the analysis and comparison of genomes from different species. The purpose of compara- tive genomics is to gain a better understanding of how species have evolved and to determine the function of genes and non coding regions of the genomes. Genome researchers look at many different features when comparing genomes: sequence similarity, gene location, the length and number of coding regions (exons) within genes, the amount of non coding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and as complex as humans. Genomic comparisons performed to find genes among closely related pathogens that differ in their host ranges have yielded contrasting data [12]. Though whole genome analysis of Staphylococcus aureus and genome-based analysis of virulence genes in Staphylococcus epidermidis has been reported, com- parative genome analysis of all the available Staphylococcus species may be necessary step toward future development of countermeasure against this organism. 2 METHODOLOGY 2.1 Extraction of genomic data The source of genomic data was NCBI Reference Sequence collection [13], available on NCBI ftp server (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ ) as individual files in GenBank format. The data was gathered and compiled for all the selected genomes in order to retrieve information as homogeneous and consistent. The complete genome entry files were stored in local database in a proper format to retrieve various attributes of genome features for genomic studies. The information was stored as structural data together with complete information of each genome for manual examination and data curation. Once the batch of initial data was loaded in the local database, it was enriched with additional information retrieved from public repositories to the functional annotation of proteins coded by each genome. S 1775 IJSER