Group analyses of connectivity-based cortical parcellation using repeated k -means clustering Luca Nanetti a,b , Leonardo Cerliani a,b , Valeria Gazzola a,b , Remco Renken a,c , Christian Keysers a,b, ⁎ a BCN NeuroImaging Center, University of Groningen, The Netherlands b Department of Neuroscience—Section Social Brain Lab, University Medical Center Groningen, The Netherlands c Department of Neuroscience, University Medical Center Groningen, The Netherlands abstract article info Article history: Received 20 August 2008 Revised 7 May 2009 Accepted 3 June 2009 Available online xxxx Keywords: Anatomical connectivity K-means clustering K-means SMA–preSMA Insula K-means clustering has become a popular tool for connectivity-based cortical segmentation using Diffusion Weighted Imaging (DWI) data. A sometimes ignored issue is, however, that the output of the algorithm depends on the initial placement of starting points, and that different sets of starting points therefore could lead to different solutions. In this study we explore this issue. We apply k-means clustering a thousand times to the same DWI dataset collected in 10 individuals to segment two brain regions: the SMA–preSMA on the medial wall, and the insula. At the level of single subjects, we found that in both brain regions, repeatedly applying k-means indeed often leads to a variety of rather different cortical based parcellations. By assessing the similarity and frequency of these different solutions, we show that ∼ 256 k-means repetitions are needed to accurately estimate the distribution of possible solutions. Using nonparametric group statistics, we then propose a method to employ the variability of clustering solutions to assess the reliability with which certain voxels can be attributed to a particular cluster. In addition, we show that the proportion of voxels that can be attributed signiﬁcantly to either cluster in the SMA and preSMA is relatively higher than in the insula and discuss how this difference may relate to differences in the anatomy of these regions. © 2009 Elsevier Inc. All rights reserved. Introduction DW-MRI (Diffusion Weighted-Magnetic Resonance Imaging) infers information about white matter structure in the brain from the differential attenuation of the spin echo signal, as modulated by the local spatial microstructure of the surrounding medium, and by the strength and direction of the applied magnetic diffusion gradient (Basser et al., 1994; Pierpaoli et al., 1996). Using this method in conjunction with probabilistic tractography, one can estimate, for each individual voxel of the brain (seed) whether it is connected or not with all other voxels of the brain (target). This information is called the binarized tractogram (also known as binarized connectivity proﬁle) of that voxel (Behrens et al., 2003; Hosey et al., 2005). Johansen-Berg (2004) ﬁrst illustrated how this information can be used to divide the medial motor wall in two subregions, which on the basis of their location and functional properties were likely to represent the supplementary and presupplementary motor area— SMA and preSMA. They used probabilistic tractography for all voxels within the SMA–preSMA complex to deﬁne the corresponding tractograms, took the correlation between the tractograms of each pair of voxels in the SMA–preSMA as a measure of their similarity and calculated the full cross-correlation matrix (cc-matrix) of all tracto- grams. They then reordered the cc-matrix using a spectral reordering algorithm (Higham et al., 2007). Eye-balling then revealed a sudden discontinuity in the reordered matrix. Remapping the location of the voxels on either side of this discontinuity, they found that they fell within putative SMA and preSMA, respectively. This method attracted much attention, because it is plausible that if two subregions have different connectivities, they may also have different functions. Requiring an experimenter to decide where to place the border between the regions however is undesirable. To circumvent this caveat, researchers turned towards k-means cluster- ing (Hartigan, 1975; Hartigan and Wong, 1979) to decide where to place the border between clusters. K-means clustering is an iterative algorithm, which, for the case of connectivity proﬁles, ultimately divides the voxels of a seed region into k non overlapping clusters of voxels (the experimenter needs to decide the value of k based on functional and anatomical considerations). This is done in a hyper- space with as many dimensions as there are voxels in the seed region. Each seed voxel is represented as a point whose coordinates are the correlation of its tractogram with all the other voxels' tractograms. Different strategies exist to choose the initial putative centroids for the algorithm. Hartigan and Wong (1979) propose to randomly choose k points (i.e. voxels in our case) from the initial set, the coordinates of which become the centers of the k clusters. A frequently used NeuroImage xxx (2009) xxx–xxx ⁎ Corresponding author. Antonius Deusinglaan2, 9713AW Groningen, The Netherlands. Fax: +315036387500 E-mail address: c.m.keysers@rug.nl (C. Keysers). YNIMG-06338; No. of pages: 12; 4C: 1053-8119/$ – see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2009.06.014 Contents lists available at ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg ARTICLE IN PRESS Please cite this article as: Nanetti, L., et al., Group analyses of connectivity-based cortical parcellation using repeated k-means clustering, NeuroImage (2009), doi:10.1016/j.neuroimage.2009.06.014