L. Grilli, C. Rampichini: Ordered logit model 1 ORDERED LOGIT MODEL * Leonardo Grilli, Carla Rampichini Dipartimento di Statistica, Informatica, Applicazioni “G. Parenti” – Università di Firenze grilli@ds.unifi.it, rampichini@ds.unifi.it The ordered logit model is a regression model for an ordinal response variable. The model is based on the cumulative probabilities of the response variable: in particular, the logit of each cumulative probability is assumed to be a linear function of the covariates with regression coefficients constant across response categories. Questions relating to satisfaction with life assessment and expectations are usually ordinal in nature. For example, the answer to the question on how satisfied a person is with her quality of life can range from 1 to 10, with 1 being very dissatisfied and 10 being very satisfied (e.g. Schaafsma and Osoba, 1994; Anderson et al. 2009). It is tempting to analyse ordinal outcomes with the linear regression model, assuming equal distances between categories. However, this approach has several drawbacks which are well known in literature (see, for example, McKelvey and Zavoina, 1975; Winship and Mare, 1984; Lu, 1999). When the response variable of interest is ordinal, it is advisable to use a specific model such as the ordered logit model. Let Y i be an ordinal response variable with C categories for the i-th subject, alongside with a vector of covariates x i . A regression model establishes a relationship between the covariates and the set of probabilities of the categories p ci =Pr(Y i =y c | x i ), c=1,…,C. Usually, regression models for ordinal responses are not expressed in terms of probabilities of the categories, but they refer to convenient one-to- one transformations, such as the cumulative probabilities g ci =Pr(Y i ≤y c | x i ), c=1,…,C. Note that the last cumulative probability is necessarily equal to 1, so the model specifies only C1 cumulative probabilities. An ordered logit model for an ordinal response Y i with C categories is defined by a set of C1 equations where the cumulative probabilities g ci =Pr(Y i ≤y c | x i ) are related to a linear predictor 'x i =  0 + 1 x 1i + 2 x 2i +… through the logit function: logit(g ci ) = log(g ci  g ci    c  'x i , c = 1,2,…,C1. (1) The parameters  c , called thresholds or cutpoints, are in increasing order ( 1 <  2 < … <  C-1 ). It is not possible to simultaneously estimate the overall intercept  0 and all the C1 thresholds: in fact, adding an arbitrary constant to the overall intercept  0 can be counteracted by adding the same constant to each threshold  c . This identification problem is usually solved by either omitting the overall constant from the linear predictor (i.e.  0 = 0) or fixing the first threshold to zero (i.e.  1 = 0). The vector of the slopes  is not indexed by the category index c, thus the effects of the covariates are constant across response categories. This feature is called the parallel regression assumption: indeed, * Draft of Grilli L. & Rampichini C. (2014) Ordered logit model. In: Michalos AC (Ed.). Encyclopedia of Quality of Life and Well- Being Research. Dordrecht, Netherlands: Springer, pp 4510-4513. ISBN 978-94-007-0752-8.