PSYCHOMETRIKA—VOL. 77, NO. 4, 724–740
OCTOBER 2012
DOI : 10.1007/ S11336-012-9275-3
THE SIMCLAS MODEL: SIMULTANEOUS ANALYSIS OF COUPLED BINARY DATA
MATRICES WITH NOISE HETEROGENEITY BETWEEN AND WITHIN DATA BLOCKS
TOM F. WILDERJANS
RESEARCH GROUP OF QUANTITATIVE PSYCHOLOGY AND INDIVIDUAL DIFFERENCES,
DEPARTMENT OF PSYCHOLOGY, KU LEUVEN
E. CEULEMANS
DEPARTMENT OF EDUCATIONAL SCIENCES, KU LEUVEN
I. V AN MECHELEN
RESEARCH GROUP OF QUANTITATIVE PSYCHOLOGY AND INDIVIDUAL DIFFERENCES,
DEPARTMENT OF PSYCHOLOGY, KU LEUVEN
In many research domains different pieces of information are collected regarding the same set of
objects. Each piece of information constitutes a data block, and all these (coupled) blocks have the object
mode in common. When analyzing such data, an important aim is to obtain an overall picture of the
structure underlying the whole set of coupled data blocks. A further challenge consists of accounting for
the differences in information value that exist between and within (i.e., between the objects of a single
block) data blocks. To tackle these issues, analysis techniques may be useful in which all available pieces
of information are integrated and in which at the same time noise heterogeneity is taken into account. For
the case of binary coupled data, however, only methods exist that go for a simultaneous analysis of all
data blocks but that do not account for noise heterogeneity. Therefore, in this paper, the SIMCLAS model,
being a Hierarchical Classes model for the simultaneous analysis of coupled binary two-way matrices,
is presented. In this model, noise heterogeneity between and within the data blocks is accounted for by
downweighting entries from noisy blocks/objects within a block. In a simulation study it is shown that (1)
the SIMCLAS technique recovers the underlying structure of coupled data to a very large extent, and (2) the
SIMCLAS technique outperforms a Hierarchical Classes technique in which all entries contribute equally
to the analysis (i.e., noise homogeneity within and between blocks). The latter is also demonstrated in an
application of both techniques to empirical data on categorization of semantic concepts.
Key words: data fusion, coupled data, multi-set data, noise heterogeneity, simultaneous clusterings, Hi-
erarchical Classes Analysis, overlapping clustering, hierarchical relations, multivariate binary data.
1. Introduction
In many research domains different pieces of information are collected regarding the same
set of objects. Each piece of information constitutes a data block, and all these (coupled) blocks
have the object mode in common. For example, in psychiatric diagnosis research, such coupled
data are encountered when for a set of patients information is available regarding, on the one
hand, their diagnosis (i.e., patient-by-diagnosis data block) and, on the other hand, the symptoms
they exhibit (i.e., patient-by-symptom data block).
The first author is a Research Assistant of the Fund for Scientific Research (FWO)—Flanders (Belgium). The
research reported in this paper was partially supported by the Research Council of K.U. Leuven (GOA/2005/04 and
EF/2005/07, ‘SymBioSys’) and by IWT-Flanders (SBO 60045, ‘Bioframe’). We would like to thank Gert Storms and his
collaborators for providing us with an interesting data set.
Requests for reprints should be sent to Tom F. Wilderjans, Research Group of Quantitative Psychology and Indi-
vidual Differences, Department of Psychology, KU Leuven, Tiensestraat 102, Box 3713, 3000 Leuven, Belgium. E-mail:
tom.wilderjans@ppw.kuleuven.be
© 2012 The Psychometric Society
724