Research Focus Identifying sigma factors in Mycobacterium smegmatis by comparative genomic analysis Andra Waagmeester 1,2,3 , Julie Thompson 4 and Jean-Marc Reyrat 1 1 Inserm-UMR 570, Groupe Avenir, Universite ´ Paris V-Descartes, Faculte ´ de Me ´ decine, Site Necker, Paris Cedex 15, F-75730, France 2 Institut Pasteur, Unite ´ de Ge ´ ne ´ tique Mycobacte ´ rienne, Paris Cedex 15, F-75724, France 3 BiGCaT Bioinformatics, Universiteit Maastricht, NL-6200 MD, Maastricht, The Netherlands 4 Institut de Ge ´ ne ´ tique et de Biologie Mole ´ culaire et Cellulaire, 1 rue Laurent Fries, B.P. 10142, 67404 Illkirch Cedex, France Mycobacterium smegmatis is a saprophytic species that has been used for 15 years as a model to perform heterologous regulation and virulence studies of Myco- bacterium tuberculosis. Members of the extracytoplas- mic sigma factors family, which are required for adaptive responses to various environmental stresses, are responsible for some of the virulence traits of M. tuberculosis. A bioinformatic search on the genome of M. smegmatis has predicted the existence of 26 sigma factors, which is twice the number that are present in M. tuberculosis. A phylogenetic analysis has shown that despite this high number of sigma factors the orthologs of the genes sigC, sigI and sigK of M. tuberculosis are absent in the M. smegmatis genome. Several sigma factors are specific for M. smegmatis, with a special enrichment in the sigH and, to a lesser extent, in the sigJ and sigL subfamily, pinpointing the potential variability of the repertoire of adaptive response in this saprophytic species. Introduction The mycobacterial genus contains some major human pathogens, such as Mycobacterium tuberculosis, Myco- bacterium leprae and Mycobacterium ulcerans. Recently, the genomes of several mycobacterial species have been sequenced. This is the case for M. tuberculosis, M. leprae, Mycobacterium bovis and Mycobacterium avium para- tuberculosis, and we expect soon the release of M. bovis BCG, M. ulcerans, Mycobacterium microti and Myco- bacterium marinum. The complete genomic sequence of M. smegmatis has been available to the public since October 2004 (http://www.tigr.org). M. smegmatis is a saprophytic rapid-growing species, although there are some case reports of infection in humans and in animals [1]. This organism has been used to draw the basis of the mycobacterial genetics [2] and as a surrogate host to study the virulence and regulatory pathways of M. tuberculosis [3,4]. All of this information led to the conclusion that some pathways were, in part, conserved between M. tuberculosis and Mycobacterium smegmatis [5]. This report describes the sigma factor content in M. smegmatis. Sigma factors are components of the RNA polymerase complex that are responsible for binding to the RNA polymerase complex, promoter recognition and separating DNA strands [6]. The number of sigma factors is extremely diverse between bacteria, ranging from three in Helicobacter pylori to 63 in Streptomyces coelicolor . Sigma factors of the extracytoplasmic function family (ECF) are distant members of the sigma 70 family that are involved in the transcriptional regulation related to the modification of the environmental milieu [7,8]. M. tuberculosis possesses 13 sigma factors, 10 of which are from the ECF sub-family, enabling this bacterium to cope with various environmental conditions [8,9]. Some ECF sigma factors, such as SigD, SigE, SigC and SigH, are involved in the virulence of the tubercle bacilli [10–12]. Similarly a single mutation in the gene coding for the principal sigma factor (SigA) is, in part, responsible for the attenuation of a strain of M. tuberculosis [13], probably due to a lack of expression of virulence genes. So far, in M. smegmatis, seven sigma factors have been character- ized experimentally (Table 1). We performed a compara- tive genomic analysis on the genome of M. smegmatis and identified its sigma factor content. Comparative genomic analysis In March 2005, SWISSPROT from the Swiss Institute of Bioinformatics contained 188 protein sequences of sigma factors (release 46.2 – 1 March 2005). In the genome of M. tuberculosis, 13 putative sigma factors were identified. Both lists of protein sequences were stored in a database called SigDB. The 6901 putative proteins of the genome of M. smegmatis were obtained (http://www.tigr.org; 9 December 2004) and stored in a database called SmegDB. An alignment between each protein from SigDB and SmegDB was sought by using BLAST [14]. We developed software to perform and analyze the 1.2!10 6 BLAST searches. A hit was significant when the E-value was less then 10 K5 . This software was validated by applying it to several genomes with a known number of sigma factors. We found 26 potential sigma factors in M. smegmatis (Table 1). These sigma factors have a polypeptide size in the range of 150–450 amino acids, which is comparable with the size of the sigma factors present in M. tuberculosis (Table 2). Open reading frames MSMEG0395, MSMEG1341 and MSMEG5346 were the sole candidates that matched only with sigma factors from Corresponding author: Reyrat, J.-M. (jmreyrat@necker.fr). Available online 2 September 2005 Update TRENDS in Microbiology Vol.13 No.11 November 2005 www.sciencedirect.com 0966-842X/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tim.2005.08.009