Behavior Research Methods, Instruments, & Computers 1986, 18 (5), 472-474 A BASIC program package for the analysis of sorting data THOMAS ECKES Universiuu des Saarlandes, Saarbrucken, Federal Republic of Germany The sorting method has proved to be a valuable data- gathering procedure in many research areas (see, e.g., Rosenberg, 1982). Subjects are typically asked to sort a given set of objects into as many groups or classes as they wish so that objects within the same class are more simi- lar to each other than they are to objects in other classes. If the classes are mutually exclusive and collectively ex- haustive, the subjects' sortings are called partitions (see Boorman & Arabie, 1972). Compared with related pro- gram abstracts (Grant, 1983; Greenberg & McIsaac, 1984; Oud & Sattler, 1984; Takane, 1981, 1982; Wel- don & Buchter, 1984), the present program package is distinctive in that it performs (1) a two-way analysis of sorting data (i.e., computation ofproximities between ob- jects as well as computation of proximities between par- titions), (2) a significance test of the proximities between objects, and (3) an assessment of correspondence between partitions by means of two measures differing in the kind of weighting scheme used. Proximities. After the user has entered the partitions, using the input program SORTFILE (described below), program PROSORT can be used to perform a computa- tion of proximities between objects. As suggested by Miller (1969), the similarity between two objects a and b is defined as the number of subjects that placed a and b in the same class; subtracting this measure from the to- tal number of subjects yields the corresponding dissimilar- ity measure. More sophisticated indices were proposed by Rosenberg, Nelson, and Vivekananthan (1968) and by Burton (1972). Empirical tests by Burton (1975) and Dras- gow and Jones (1979), however, did not yield compel- ling evidence in favor of these indices. Furthermore, the present measure lends itself readily to a significance test- ing procedure, outlined below. As a special feature of PROSORT, proximities may also be computed separately for subgroups of subjects. Significance of Proximities. With a given number of subjects, some similarities (dissimilarities) will assume large (small) values merely by chance. Therefore, it is desirable to have a statistical procedure for testing the sig- nificance of proximity values. The procedure described here was developed by Oldenbiirger (1981). It is based on representing each subject's sorting by a Bernoulli dis- tribution. Specifically, a computation is made of the prob- The author's mailing address is: Fachrichtung Psychologie, Univer- sitat des Saarlandes. Postfach. D-66OO Saarbrucken, Federal Republic of Germany. ability (Pi) that two objects randomly drawn from the set of objects under consideration were placed in the same class by Subject i. Then, for the sample of n subjects, the distribution of a random variable X is constructed that is the sum of n independent Bernoulli distributed random variables, With n=2 subjects, for example, the proba- bility distribution can be constructed as follows: Let Pt be the probability that two objects randomly drawn from the object set were sorted together by Subject 1, and let PI be the respective probability for Subject 2. Assuming statistical independence, the probabilities associated with the X-values {O, 1, 2} are: p(X=O) = (l-Pt)(l-pz), p(X=I) = Pt(l-pz) + (1-Pt)pI, p(X=2) = ptPI' Generally, there are n random variables Y i where p(Y i= 1) = Pi and p(Yi=O) = q., with prr q, = 1. Then (for i= 1, ... ,n), K; = with expectation E(X n ) = EiE(Y i) = EiPi, and variance V(Xn) = sY(Yi) = SiPiqi' The as- sociated probabilities can be computed according to the following scheme: p(Xn=O) = 'Triqi, p(Xn=k) = p{[(Xn- t =k-l)(Yn=1)] + [(Xn- t =k)(Yn=O)]) , with O<k<n, p(Xn==n) = «.p, Thus, the construction of the probability distribution makes use of a successive technique that has the advan- tage of saving much computing time compared with a simultaneous consideration of all 2" event combinations. The successive construction method rests on the idea that the distribution of X; can be decomposed into the distri- butions of Xn- t and Yn• The program performing such an analysis is called PROSIG. Analogous to PROSORT, this program can also be used with a specified subset of input data (see the input section below). Correspondence Between Partitions. This kind of analysis is concerned with the problem of measuring the correspondence between object set partitions. Handling sorting data this way can be of interest not only in assess- ing nominal scale response agreement between two ob- servers or raters, but also in identifying homogeneous sub- groups within a sample of subjects in order to differentially analyze and represent proximity data. One of the most often used indices of correspondence is the so-called Rand index (Rand, 1971), which has been rediscovered and/or modified by other researchers (e.g., Brennan & Light, 1974; Brook & Stirling, 1984; Fowlkes & Mallows, 1983; Hubert, 1977; Hubert & Arabie, 1986). Rand's measure is defined as the ratio of the sum of the number of pairs of objects sorted together in the two partitions being com- pared (Type A agreements) and the number of pairs of objects sorted in different classes in both partitions (Type B agreements) to the total number of object pairs. A correction for chance agreement, which takes the general form (index value - expected index value)/(l - expected index value), is suggested by Hubert and Arabie (1986). This corrected Rand index is bounded above by 1 (for maximum similarity of both partitions) and takes on the value 0 when it equals its expected value. Both ver- Copyright 1986 Psychonomic Society, Inc. 472