Deeper into the Trees A new hybrid CHAID application analyzes multiple dependent variables. By Ken Deal software review T ree-splitting algorithms were initially developed by John Sonquist and James Morgan (1964) and rendered into THAID by Morgan and R.C. Messenger (1973) of the Institute for Social Research at the University of Michigan. G.V. Kass transformed this into CHAID (Chi-square Automatic Interaction Detection) in 1980. Leo Breiman and Jerome Friedman independently worked on the innovation that became CART (Classification and Regression Trees) in the early 1980s. Since then, those algorithms and others have been developed into viable tree-splitting computational appli- cations by SPSS, Statistical Innovations (SI), Salford Systems, SAS, Insightful Corp., and others. The introduction of SI-CHAID 4.0 is much more than just another entrant to this field; it is a creative approach to han- dling the problem of effectively segmenting databases when there are multiple dependent variables, possibly of mixed types, and a large number of candidate predictor variables. In addition, SI-CHAID 4.0 (SIC4), along with its sister product Latent GOLD Choice 4.0 (LGC4), function in concert to per- form this task on choice-based conjoint data sets. To the best of my knowledge, all other tree-splitting algo- rithms focus on finding segments that help to predict one cate- gorical criterion variable. However, there are many situations where it’s possible to identify multiple candidate criterion vari- ables. In those cases, it’s necessary in conventional tree-split- ting applications to generate a segmentation for each criterion variable separately. Then, of course, the segmentations are likely not to be congruent and the problem of choosing among those segmentations presents itself. SI-CHAID 4.0 has been developed to provide exploratory segmentation trees that are predictive of multiple correlated dependent variables. These multiple criterion variables are probabilities of class membership obtained from LGC4, with the latent classes from LGC4 being proxies for the several dependent variables used in the analysis. It’s also possible to use the algorithm for segmenting data sets containing only one dependent variable, and it is in that use where SIC4 becomes directly comparable to other existing tree-splitting applications. LGC4, a latent class analysis application of Statistical Innovations (SI), is integrated with SPSS and SI-CHAID 4.0 to provide a nearly seamless way to identify latent classes based on multiple criterion variables and then to identify segments based on variables that can be used to score the related cus- tomer database. Of special value is the facility for LGC4 to work directly with Sawtooth Software CBC (choice-based conjoint) files in *.cho format to generate the latent classes that are moved into SIC4 and used to produce the segments. (Please note that LGC4 contains several substantial features not available in the earlier version that was reviewed in the Winter 2003 edition of Marketing Research.) Exhibit 1 illus- trates the analysis process depending on the type of input data. SIC4 comprises SIC4 Define, which configures the analysis mechanism, and SIC4 Explore, which generates the tree and allows for interactive exploration of the tree. Data input. SI has been successful at making data input very easy and output very instructive and attractive in all of its applications, and this is true of SI-CHAID as well. LGC4 and SIC4 both take SPSS files as direct input, and ASCII files also can be used. When using Latent GOLD Choice prior to SIC4 to identify latent classes, the resulting latent class file can then be immediately opened by the SIC4 design program. If LGC4 is used to estimate latent classes based on a Sawtooth Software choice-based conjoint *.chd file, there is just a little more work needed before the file can be worked on by the SIC4 Define program. If more flexibility is needed for handling a variety of data formats, the excellent file translator DBMS/COPY is available for a small premium. Analysis. The analysis stage is very easy to specify. LGC4 can be directed to produce a file automatically for use in SIC4. If the data is in an ASCII file or SPSS file and contains just one depen- dent variable, a new project is opened in the SIC4 Define appli- cation. The dependent variable(s) is entered into one window, predictors into another, the case identification variable and any sample weighting variable into the last. There are several addi- tional tabs that provide control over the algorithm, tree devel- opment, and nature of the variables. A Technical tab provides 38 Summer 2005 SI-CHAID 4.0 has been developed to provide exploratory segmentation trees that are predictive of multiple correlated dependent variables.