RESEARCH ARTICLE Analysis of the nucleotide content of Escherichia coli promoter sequences related to the alternative sigma factors Gabriel Dall'Alba 1 | Pedro Lenz Casa 1 | Daniel Luis Notari 2 | Andre Gustavo Adami 2 | Sergio Echeverrigaray 1 | Scheila de Avila e Silva 2 1 Department of Life Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil 2 Department of Exact Sciences, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul, Brazil Correspondence Scheila de Avila e Silva. Department of Exact Sciences, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas, 1130, Petrópolis, Caxias do Sul, Rio Grande do Sul, CEP 95070560, Brazil. Email: sasilva6@ucs.br Funding information Universidade de Caxias do Sul (UCS) Abstract Promoters are DNA sequences located upstream of the transcription start site of genes. In bacteria, the RNA polymerase enzyme requires additional subunits, called sigma factors (σ) to begin specific gene transcription in distinct environmental condi- tions. Currently, promoter prediction still poses many challenges due to the character- istics of these sequences. In this paper, the nucleotide content of Escherichia coli promoter sequences, related to five alternative σ factors, was analyzed by a machine learning technique in order to provide profiles according to the σ factor which recog- nizes them. For this, the clustering technique was applied since it is a viable method for finding hidden patterns on a data set. As a result, 20 groups of sequences were formed, and, aided by the Weblogo tool, it was possible to determine sequence pro- files. These found patterns should be considered for implementing computational pre- diction tools. In addition, evidence was found of an overlap between the functions of the genes regulated by different σ factors, suggesting that DNA structural properties are also essential parameters for further studies. KEYWORDS bacterial transcription, bioinformatics, clustering technique, promoters, sigma factor 1 | INTRODUCTION In prokaryotes, the specificity of gene expression is regulated by proteic subunits known as sigma factors (σ). They are responsible for guiding the catalytic core RNA polymerase to specific promoter sequences located upstream of the transcription start site (TSS) of a gene. The constant swap between σ factors that bind into the RNAP results in the transcription of different groups of genes, each one with different expression patterns. Bacteria such as Escherichia coli and related Gammaproteobacteria maintain constant expression of genes recognized by the major σ factor, also known as σ 70 or the housekeepingfactor. Additionally, there are alternative σ factors associated with specific and programmed responses. Each σdependent promoter sequence presents different conserved motifs, which distinguish themselves from one another. 1 In total, six distinct alternative σ factors are known in E. coli, for instance: σ 19 , σ 24 , σ 28 , σ 32 , σ 38 , and σ 54 . Briefly, σ 24 and σ 32 (encoded by RpoE and RpoH, respectively) are known for regulating heat shock response genes. A sudden heat shock, if not rapidly answered to, results to protein unfolding which, in turn, may lead to the cell's death. 2 Therefore, a rapid mobilization of σ factors that produces heat shock proteins is required in order to tackle such rapid environmental changes. The σ 28 (product of the FliA gene) regulates, mostly, genes related to flagellar synthesis and cell motility. Additionally, pathogenicity and virulence can also be linked to this σ factor. 3,4 The σ38 is a product of the RpoS gene and regulates genes that participate in the general stress response of a bacteria. Landini et al (2014) 5 points out that the σ 38 is not particularly essential for growth (in both presence or absence) but is the responsible for very sensible Received: 8 August 2018 Revised: 23 October 2018 Accepted: 24 October 2018 DOI: 10.1002/jmr.2770 J Mol Recognit. 2018;e2770. https://doi.org/10.1002/jmr.2770 © 2018 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/jmr 1 of 7