Comparing latent class and dissimilarity based clustering for mixed type variables with application to social stratification Christian Hennig and Tim F. Liao Department of Statistical Science, UCL, Department of Sociology, University of Illinois August 3, 2010 Abstract Data with mixed type (metric/ordinal/nominal) variables can be clustered by a latent class mixture model approach, which assumes local independence. Such data are typical in social stratification, which is the application that motivates the current paper. We explore whether the latent class approach groups similar observations together and compare it to dissimilarity based clustering (k-medoids). The design of an appropriate dissimilarity measure and the estimation of the number of clusters are discussed as well, comparing the BIC, average silhouette width and the Calinski and Harabasz index. The comparison is based on a philosophy of cluster analysis that con- nects the problem of a choice of a suitable clustering method closely to the application by considering direct interpretations of the implications of the methodology. According to this philosophy, model assumptions serve to un- derstand such implications but are not taken to be true. It is emphasised that researchers implicitly define the “true” clustering and number of clusters by the choice of a particular methodology. It is illustrated that even if there is a true model, a clustering that doesn’t attempt to estimate this truth may be preferable. The researcher has to take the responsibility to specify the criteria on which such a comparison can be made. The application of this philoso- phy to data from the 2007 US Survey of Consumer Finances implies some techniques to obtain an interpretable clustering in an ambiguous situation. Keywords: mixture model, k-medoids clustering, dissimilarity design, number of clusters, interpretation of clustering * Research Report No. 308, Department of Statistical Science, University College London. Date: August 2010. 1