Ecology and Evoluion 2017; 1–19 | 1 www.ecolevol.org Received: 27 April 2016 | Revised: 23 November 2016 | Accepted: 22 December 2016 DOI: 10.1002/ece3.2760 ORIGINAL RESEARCH Resemblance proiles as clustering decision criteria: Esimaing staisical power, error, and correspondence for a hypothesis test for mulivariate structure Joshua P. Kilborn | David L. Jones | Ernst B. Peebles | David F. Naar This is an open access aricle under the terms of the Creaive Commons Atribuion License, which permits use, distribuion and reproducion in any medium, provided the original work is properly cited. © 2017 The Authors. Ecology and Evoluion published by John Wiley & Sons Ltd. College of Marine Science, University of South Florida, Saint Petersburg, FL, USA Correspondence Joshua P. Kilborn, College of Marine Science, University of South Florida, Saint Petersburg, FL, USA. Email: jpk@mail.usf.edu Funding informaion Naional Oceanic and Atmospheric Administraion, Grant/Award Number: NA10NMF4550468. Abstract Clustering data coninues to be a highly acive area of data analysis, and resemblance proiles are being incorporated into ecological methodologies as a hypothesis tesing- based approach to clustering mulivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm’s assumpions or any underlying data structures. Here, we use simulaion studies to esimate the staisical error rates for the hypothesis test for mulivariate structure based on dissimilarity proiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmeic mean (UPGMA) to esimate the proiciency of clustering with DISPROF as a decision criterion. We simulated unstructured mulivariate data from diferent probability distribuions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlaion among descriptors within groups. Using simulated data, we measured the resoluion and correspondence of clustering soluions achieved by DISPROF with UPGMA against the reference grouping pariions used to simulate the structured test datasets. Our results highlight the dynamic interacions between dataset dimensionality, group overlap, and the properies of the descriptors within a group (i.e., overdispersion or correlaion structure) that are relevant to resemblance proiles as a clustering criterion for mulivariate data. These methods are paricularly useful for mulivariate ecological datasets that beneit from distance-based staisical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potenial pifalls during the applicaion of methods and the interpretaion of results. KEYWORDS constrained clustering, data simulaion, Monte Carlo, permutaion tesing, PRIMER-E, SIMPROF 1 | INTRODUCTION In data-rich scieniic studies, it is oten necessary to apply a clustering algorithm to detect groups of homogenous objects with respect to a set of descriptors (i.e., measured variables). Detecion of groups is use- ful in ecology, economics, geneics, and other disciplines that analyze large, mulidimensional datasets. Clustering techniques for mulivari- ate datasets are diverse and can be drawn from methods derived from