Transpn. Res-A Vol 19A. No. 4. pp. 315-324. 1985 0191-2607/85 $3.00+ 00 Printed in Great Britain © 1985 Pergamon Press Ltd DISAGGREGATE MODE CHOICE MODELS AND THE AGGREGATION ISSUE: SOME EMPIRICAL RESULTS J. P. DUNNE University of Warwick, Coventry, CV4 7AL, U.K. (Received 4 April 1983; in revised form 20 Januar3.~ 1985) Abstract--This paper presents a comparative analysis of the alternative approaches to providing aggregate prediction models from disaggregate mode choice models. In general, the results support the findings of previous studies and illustrate the importance of realizing the trade-off between aggregation bias in simple procedures and the practical problems of more complex approaches. INTRODUCTION Disaggregate mode choice models have shown them- selves to have many advantages over conventional ag- gregate models. In concentrating the analysis at the level of the individual behavioural unit. they have allowed consideration of the factors that influence the travel be- haviour of individuals and have made more efficient use of available data. Their consistent theoretical base, de- veloped from the postulates of consumer rationality and utility maximization, has allowed disaggregate models a claim to generality. In addition, their encompassing of policy-relevant variables has provided them with a po- tentially more useful role in forecasting than descriptive aggregate models. (See, for example. De Donnea, 1971; Domencich and McFadden, 1975: Richards and Ben Ak- iva, 1975.) While it is desirable to estimate choice models at a disaggregate level, however, the use of the models in prediction will generally require some level of aggre- gation. The transformation of disaggregate models into aggregate prediction models, although simple in principle (complete enumeration), does raise practical problems, as it requires predicted values for each individual in the sample and therefore has rather extreme data require- ments. As a result, "short cut" aggregation methods have been developed that can be distinguished by the way in which they represent distributions of the explanatory var- iables across the sample. The simplest is the naive approach, which uses the average sample values of the independent variables to- gether with the disaggregate model coefficient estimates. This will, however, provide inaccurate predictions as the average of a nonlinear function is not the same as the function evaluated at the average values. To overcome this problem, a number of other approaches have been developed. These include the classification approach (Koppelman, 1976), which uses the naive method on relatively homogeneous subgroups; the statistical differ- entials approach (Talvitie, 1973, 1976), which uses the moments of the distribution of probabilities over the pop- ulation; and the density function approach (Westin, 1974; Watson and Westin, 1975), which uses a family of dis- tributions to model the population relative frequency dis- tribution (RFD) of probabilities. This paper aims to provide some new empirical evi- dence on the adequacy of these methods of aggregation using data from a study of mode choice between Liv- ingston New Town and Edinburgh. The disaggregate model is a binary logit model of mode choice for the journey to work. Although the true test of an "aggregated" pre- diction model is its adequacy when confronted with new data, the assessment here is based upon it's representation of the original sample on which the disaggregate models were estimated. While this is only a first step in the assessment of a model to be used for prediction, if a model were to perform badly at this stage, it would cer- tainly not be worth using it in more stringent tests. In addition, the study provides some useful comparative assessments of the various aggregation methods em- ployed. The first section of the paper outlines the available approaches, the second section reports the results of their empirical application and the third section presents some conclusions. METHODS OF AGGREGATION Enumeration The correct approach to aggregation is to calculate the average probability by estimating each individual's choice probability and taking the average. In this case, the values of the independent variables directly relevant to each individual in the prediction group (complete enumeration) or a subset of that group (sample enumeration) are used. Thus, the average probability (P) is 1 = ~xP,, (l) I where e.~/[3 Pi - 1 + e -','~" 13 being the vector of parameter estimates, and x~ is the individual values of the explanatory variables. Although providing the most theoretically consistent approach, 315