COMPUTER PROGRAM NOTE CorrSieve: software for summarizing and evaluating Structure output M.G. CAMPANA,* H.V. HUNT,* H. JONES† and J. WHITE† *McDonald Institute for Archaeological Research, University of Cambridge, Downing Street, Cambridge CB2 3ER, UK, National Institute for Agricultural Botany, Huntingdon Road, Cambridge CB3 0LE, UK Abstract The clustering software Structure has been used extensively to infer population structure in natural populations from mul- tilocus genotype data. Determining meaningful values of K, the assumed number of subpopulations is one of the primary challenges of making biological inferences from Structure data. The package CorrSieve summarizes Structure output and performs a number of tests, including both previously reported methods and novel ones, to help determine meaningful values of K. Keywords: cluster analysis, population genetics, population structure, Structure Received 15 June 2010; revision received 2 August 2010; accepted 18 August 2010 Introduction The clustering software Structure (Pritchard et al. 2000) has been used extensively (4593 citations as of July 2010) to infer population structure in natural populations from multilocus genotype data. Nevertheless, determining the most meaningful value of K, the number of inferred sub- populations has proven difficult. In fact, there are likely to be multiple biologically meaningful K values for any one data set depending on the questions asked of the data (Tishkoff et al. 2009). CorrSieve is a Ruby (Matsumoto 2007) and R (R Devel- opment Core Team 2009) package that helps determine meaningful values of K by summarizing and evaluating Structure output. The software automates several repeti- tive and slow statistical tests, which currently must be performed by hand. The most common methods for determining appropri- ate values of K are examining the Ln P(D) output from Structure as recommended by Pritchard et al. (2000) and calculating the DK statistic (948 citations as of July 2010) developed by Evanno et al. (2005). CorrSieve can both summarize the Ln P(D) output from Structure and calcu- late the DK statistic. CorrSieve also performs fractional subpopulation membership matrix (Q matrix) correlation tests as described in Cockram et al. (2008). Finally, the software can examine changes in F ST values over the values of K and calculate a novel statistic, DF ST —a mea- sure based on Evanno et al.’s DK, but utilizing Structure’s F ST estimates, rather than Ln P(D) values. CorrSieve is open source and freely available under the GNU General Purpose Licence (v3). The stand-alone Ruby version is available on the web from http:// www.mcdonald.cam.ac.uk/projects/genetics/index.htm, whilst the R version is available from the Comprehensive R Archive Network (http://cran.r-project.org/). As Ruby and R are free, cross-platform and computer resource efficient, CorrSieve can be executed in most Windows, Macintosh and UNIX (or UNIX-like) environments. The software requires either a Ruby interpreter, available on the Ruby website (http://www.ruby-lang.org), or the R statistical environment, available from the Comprehen- sive R Archive Network. Novel algorithms In addition to summarizing Ln P(D) and calculating DK, CorrSieve performs several novel methods. Q matrix correlations Calculating the correlation coefficients of the fractional subpopulation membership matrices between duplicate Structure runs can help determine whether individual K solutions are stable (Cockram et al. 2008). CorrSieve can identify stable K solutions through several novel correla- tion algorithms. Correspondence: Michael G. Campana, Fax: 01223 339285; E-mail: mgc32@cam.ac.uk Ó 2010 Blackwell Publishing Ltd Molecular Ecology Resources (2010) doi: 10.1111/j.1755-0998.2010.02917.x