The estimation of generalized extreme value models from choice-based samples M. Bierlaire a, * , D. Bolduc b , D. McFadden c a Transport and Mobility Laboratory, Ecole Polytechnique Fe ´de ´rale de Lausanne, Switzerland b De ´partement d’e ´conomique, Universite ´ Laval, Que ´ bec, Canada c Econometrics Laboratory, University of California, Berkeley, United States Received 15 August 2006; received in revised form 19 September 2007; accepted 20 September 2007 Abstract In the presence of choice-based sampling strategies for data collection, the property of multinomial logit (MNL) mod- els, that consistent estimates of all parameters but the constants can be obtained from an exogenous sample maximum likelihood (ESML) estimation, does not hold in general for generalized extreme value (GEV) models. We propose a con- sistent ESML estimator for GEV models in this context. We first identify a specific class of GEV models with the desired property that, similarly to MNL, the constants absorb the potential bias. We then propose a new and simple weighted conditional maximum likelihood (WCML) estimator for the more general case. Contrarily to the weighted exogenous sample maximum likelihood (WESML) estimator by Manski and Lerman [Manski, C., Lerman, S., 1977. The estimation of choice probabilities from choice-based samples. Econome- trica 45, 1977–1988], the new WCML estimator does not require an external knowledge of the market shares. We show that this applies also to the case where alternatives are sampled from a large choice set, and we illustrate the use of the estimator on synthetic and real data. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Discrete choice; Selection bias; GEV models; Maximum likelihood; Estimation 1. Introduction The estimation of discrete choice models is a very difficult task when the sampling strategy is based on the endogenous variable: the choice. These sampling techniques, known as choice-based, are however commonly used in practice. Choice may be an economical method of drawing subjects from a target population. Further, the analyst may want to analyze a product with a small market share by oversampling users when collecting sufficient data with a simple random sampling may require a prohibitively large sample size. The seminal paper on choice-based sampling by Manski and Lerman (1977) proposed a consistent general- ized method of moments estimation method. Referred to as weighted exogenous sample maximum likelihood 0191-2615/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.trb.2007.09.003 * Corresponding author. Tel.: +41 21 693 25 37; fax: +41 21 693 80 60. E-mail address: michel.bierlaire@epfl.ch (M. Bierlaire). Available online at www.sciencedirect.com Transportation Research Part B 42 (2008) 381–394 www.elsevier.com/locate/trb