ESTIMATION THEORY FOR THE CUSP CATASTROPHE MODEL Loren Cobb, Medical University of South Carolina ABSTRACT The 'cusp' model of catastrophe theory is very closely related to certain multiparameter expon- ential families of probability density functions. This relationship is exploited to create an estimation theory for the cusp model. An example is presented in which the independent variable has a 'bifurcation' effect on the dependent variable. INTRODUCTION The 'elementary' catastrophe models of Thom (1975) and Zeeman (1977) have attracted the attention of researchers and theorists throughout the sciences. A persistent problem with virtually all published applications, however, has been the absence of statistical procedures for detecting the presence of a catastrophe in any given body of data. This lack has resulted in some severe criticism of catastrophe models for being, among other things, speculative and unverifiable (Sussmann and Zahler, 1978). Thus catastrophe models have become associated in many minds with reckless speculation and intellectual irresponsi- bility. As part of an effort to overcome this problem, this paper presents an estimation theory and the beginnings of an inferential theory, in a form useful for survey research applications of catastrophe models. Catastrophe models come in both dynamic and static forms, the static forms being simply the equilibria (stable and unstable) of the dynamic forms. The capacity for multiple stable equilibria is inherent in catastrophe models: this is the principal feature which distinguishes them from the standard models used in linear and polynomial regression. In effect, the 'control' factors of a catastrophe model correspond to the independent variables of a statistical model, and the 'behavioral' variable of a catastrophe model corresponds to the dependent variable of a statistical model. When the control factors are such that the behavioral variable is in a multi- stable situation, then each stable equilibrium value is a predicted value of the behavioral variable - thus there is more than one predicted value. In addition, the unstable equilibria which separate the stable equilibria are also predictions of a sort" they are the values that we predict the behavioral variable will no__!t h a v e . This feature of catastrophe models makes it difficult to define the size of an error of prediction. There are two ways of overcoming this difficulty. Both of these ways have emerged from a study of various forms of dynamic stochastic catastrophe models (Cobb, 1978, 1981, and Cobb and Watson, 1981). One of these is based on the method of moments and is an estimation method only, while the other is based on maximum likelihood estimation and permits hypothesis testing with the use .of the chi-square approx- imation to the likelihood ratio test. The former has the advantage of computational simplicity, while the latter is clearly preferable when hypotheses must be tested. THE CUSP MODEL The canonical cusp model can be thought of as a rather peculiar response surface model. It's shape may be seen in Figure 1 on the next page. Note that sections taken through the depicted surface parallel to the a-axis are just cubic polynomials in y, the dependent variable. The entire surface is defined by the implicit equation O = a + ~(y-X)/a- {(y-X)/a]'. If we let z = (y-X)/~ be the 'standardized' dependent variable, then the cubic equation is s imp ly O = (I + ~z - z s . It may be seen that X and o are the location and scale parameters, respectively. The roots of the cubic polynomial are the predicted values of z, given a and ~. When there are three roots, the central root is an 'anti-prediction*: a prediction of where the dependent variable will not be. This feature of the cusp model is clarified in Figure 2, which shows the sequence of conditional probability density functions for y, with a fixed as ~ is increased. This sequence corresponds to the trajectory and its projection that are shown in Figure 1. These probability density functions will be discussed in a later section. The two dimensions of the 'control' space, a and ~, are canonical factors which depend upon the actual measured independent variables, say X1, .... X . As a first approximation, we may suppose th]t the control factors depend linearly upon the independent variables: a = a 0 + alX 1 + ... + avXv, = ~0 + ~1X1 + "'" + ~viv" Thus the statistical estimation problem is to find estimates for the 2v+4 parameters {X,~,a 0 ..... av,~ 0 ..... ~v } , from n observations of the v+l variables {Y, x I ..... Xv }" As ~ changes from negative to positive, the conditional probability density function of y changes in shape from unimodal to bimodal. For this reason the ~ factor will be called the bifurcation factor (it has also been called the 772