Author's personal copy The potential of the WRKY gene family for phylogenetic reconstruction: An example from the Malvaceae James W. Borrone a , Alan W. Meerow a, * , David N. Kuhn a , Barbara A. Whitlock b , Raymond J. Schnell a a USDA-ARS, Subtropical Horticultural Research Station, National Germplasm Repository, 13601 Old Cutler Road, Miami, FL 33158, USA b 29 Cox Science Center, Department of Biology, University of Miami, 13601 Memorial Drive, Coral Gables, FL 33124, USA Received 11 October 2006; revised 15 June 2007; accepted 19 June 2007 Available online 30 June 2007 Abstract The WRKY gene family of transcription factors is involved in several diverse pathways and includes components of plant-specific, ancient regulatory networks. WRKY genes contain one or two highly conserved DNA binding domains interrupted by an intron. We used partial sequences of five independent WRKY loci to assess their potential for phylogeny reconstruction. Loci were originally isolated from Theobroma cacao L. by PCR with a single pair of degenerate primers; loci-specific primers were subsequently designed. We tested those loci across the sister genera Herrania Goudot and Theobroma L., with Guazuma ulmifolia Lam. as the outgroup. Overall, the combined WRKY matrices performed as well or better than other genes in resolving the intrageneric phylogeny of Herrania and The- obroma. The ease of isolating numerous, independent WRKY loci from diverse plant species with a single pair of degenerate primers designed to the highly conserved WRKY domain, renders them extremely useful tools for generating multiple, single or low copy nuclear loci for molecular phylogenetic studies at lower taxonomic levels. This is the first demonstration of the potential for members of the WRKY gene family for phylogenetic reconstruction. Published by Elsevier Inc. Keywords: WRKY gene family; Phylogeny; Theobroma; Herrania; Low copy nuclear genes; Byttnerioideae 1. Introduction Single or low copy nuclear genes represent a source of multiple, unlinked and independently evolving loci, the ideal data set for molecular phylogenetic inference (Cronn et al., 2002, 2003; Rokas et al., 2003) due to their high rate of synonymous substitution compared to chloroplast or mitochondrial genes (Wolfe et al., 1987; Gaut, 1998) and biparental inheritance (Hughes et al., 2006). The relative dearth of single or low copy nuclear genes successfully used for phylogenetic analysis is due, in part, to the methodolog- ical problems associated with their isolation and amplifica- tion (Small et al., 2004). In general, two approaches have been used to generate phylogenetically useful low copy nuclear genes, each having its own advantages and disad- vantages (reviewed in Hughes et al., 2006). The sequence characterized amplified region (SCAR)-based approach obtains sequence information from randomly amplified genomic regions through the use of AFLP or RAPD prim- ers. The comparative anchor tagged sequence (CATS)- based approach compares expressed sequence tags (ESTs) and/or complete genomic sequences to identify ‘‘candi- date’’ genes, assuming that the sequence conservation observed across evolutionarily distant taxa implies orthology. Several gene families, including the WRKY gene family of transcription factors (named for the highly conserved amino acid motif) (Eulgem et al., 2000), show evolutionary expansion in plants (Riechmann et al., 2000; Lespinet et al., 2002; Shiu et al., 2005) and are components of plant-spe- cific, ancient regulatory networks (Doebly and Lukens, 1055-7903/$ - see front matter Published by Elsevier Inc. doi:10.1016/j.ympev.2007.06.012 * Corresponding author. Fax: +1 305 969 6410. E-mail address: ameerow@saa.ars.usda.gov (A.W. Meerow). www.elsevier.com/locate/ympev Molecular Phylogenetics and Evolution 44 (2007) 1141–1154