1 Exploration of the presence and abundance of multidrug resistance efux genes in oil and gas environments Damon C. Brown, Naomi Aggarwal and Raymond J. Turner* RESEARCH ARTICLE Brown et al., Microbiology 2022;168:001248 DOI 10.1099/mic.0.001248 Received 30 April 2022; Accepted 19 August 2022; Published 03 October 2022 Author afliations: 1 University of Calgary, Calgary, Alberta, Canada. *Correspondence: Raymond J. Turner, turnerr@ucalgary.ca Keywords: metagenomic; multidrug resistance efux pumps; PCR; MDREP; MDR. Abbreviations: DWH, deepwater horizon; MDREP, multidrug resistance efux pump; OMP, outer membrane protein; ORF, open reading frame; PAH, polyaromatic hydrocarbon; PCR, polymerase chain reaction. One supplementary table is available with the online version of this article. 001248 © 2022 The Authors Abstract As sequencing technology improves and the cost of metagenome sequencing decreases, the number of sequenced environ- ments increases. These metagenomes provide a wealth of data in the form of annotated and unannotated genes. The role of multidrug resistance efux pumps (MDREPs) is the removal of antibiotics, biocides and toxic metabolites created during aro- matic hydrocarbon metabolism. Due to their naturally occurring role in hydrocarbon metabolism and their role in biocide toler- ance, MDREP genes are of particular importance for the protection of pipeline assets. However, the heterogeneity of MDREP genes creates a challenge during annotation and detection. Here we use a selection of primers designed to target MDREPs in six pure species and apply them to publicly available metagenomes associated with oil and gas environments. Using in silico PCR with relaxed primer binding conditions we probed the metagenomes of a shale reservoir, a heavy oil tailings pond, a civil wastewater treatment, two marine sediments exposed to hydrocarbons following the Deepwater Horizon oil spill and a non- exposed marine sediment to assess the presence and abundance of MDREP genes. Through relaxed primer binding conditions during in silico PCR, the prevalence of MDREPs was determined. The percentage of nucleotide sequences identifed by the MDREP primers was partially augmented by exposure to hydrocarbons in marine sediment and in shale reservoir compared to hydrocarbon-free marine sediments while tailings ponds and wastewater had the highest percentages. We believe this approach lays the groundwork for a supervised method of identifying poorly conserved genes within metagenomes. INTRODUCTION Genes can be generally thought of as belonging to three main categories: core genes, characteristic genes and accessory genes. Core genes are those whose presence is (nearly) ubiquitous in life and are required for the basic functions of cellular existence including translation, transcription, replication and repair [1]. Tese genes, being central to existence, have very low mutation rates and thus are more readily annotated [2]. Characteristic genes are those that defne the traits of a given genus or species, such as their metabolic potential, pathogenicity, environmental classifcation (temperature, salinity, oxygen tolerances, etc.) or other defning characteristics. Tese genes are typically less well conserved as convergent evolution may result in the same mechanism through unique paths. Additionally, codon usage can impact the genetic similarities of characteristic genes as redundant codons produce the same amino acid sequences through unique nucleotide sequences. Tough not exempt from lateral movement in a population (through horizontal transfer), it is rare as it typically requires entire operon(s) to represent a complete biochemical pathway [3]. Mutation rates are typically a genome-wide factor, infuenced by changing environmental conditions [4]. As accessory genes such as resistance genes are found on mobile genetic elements [5], they are subjected to changing mutation rates depending on the state of the current host. Tese genes include those related to antibiotic resistance, detoxifcation of pollutants, virulence attributes, fermentation genes, extracellular signalling and quorum sensing [6]. Tese are genes that are not benefcial to the host cell in all growth states and are thus susceptible to decay [7]. Tis fuidity of genomic sequences in this class of genes is a strong candidate for contributing signifcantly to the issues of poor automated annotations which rely on either amino acid or nucleotide sequence identity. As this fuidity exists along the entire length of accessory genes, the odds of maintaining homology across the entire