Journal of Classification
https://doi.org/10.1007/s00357-019-09335-3
Note: t for Two (Clusters)
Stanley L. Sclove
1
© The Classification Society 2019
Abstract
The computation for cluster analysis is done by iterative algorithms. But here, a straightfor-
ward, non-iterative procedure is presented for clustering in the special case of one variable
and two groups. The method is univariate but may reasonably be applied to multivariate
datasets when the first principal component or a single factor explains much of the variation
in the data. The t method is motivated by the fact that minimizing the within-groups sum of
squares is equivalent to maximizing the between-groups sum of squares, and that Student’s
t statistic measures the between-groups difference in means relative to within-groups vari-
ation. That is, the t statistic is the ratio of the difference in sample means, divided by the
standard error of this difference. So, maximizing the t statistic is developed as a method for
clustering univariate data into two clusters. In this situation, the t method gives the same
results as the K-means algorithm. K-means tacitly assumes equality of variances; here, how-
ever, with t, equality of variances need not be assumed because separate variances may be
used in computing t . The t method is applied to some datasets; the results are compared
with those obtained by fitting mixtures of distributions.
Keywords Cluster analysis · Student’s t · Unequal variances
1 Introduction and Background
1.1 Introduction: Use of t for Clustering
This paper suggests use of Student’s t as an objective function to be maximized in clustering
univariate observations into two groups.
First, recall some background from analysis of variance (ANOVA). Given observations
{x
gi
,g = 1, 2, . . . , G, i = 1, 2,...,n
g
} in G groups, denote the group means by ¯ x
g
=
“Tea for Two” is a well-known song from the 1925 musical “No, No, Nanette”, music by Vincent
Youmans, lyrics by Irving Caesar.
Stanley L. Sclove
slsclove@uic.edu
1
Department of Information & Decision Sciences, University of Illinois at Chicago,
Chicago, IL, USA