A multilevel approach for repeated cross-sectional data * Lucia Modugno Department of Statistical Sciences University of Bologna (IT) e-mail: lucia.modugno@unibo.it Paola Monari Department of Statistical Sciences University of Bologna (IT) e-mail: paola.monari@unibo.it Simone Giannerini Department of Statistical Sciences University of Bologna (IT) e-mail: simone.giannerini@unibo.it Silvia Cagnone Department of Statistical Sciences University of Bologna (IT) e-mail: silvia.cagnone@unibo.it 1 Abstract The main aim of this work is to propose a multilevel approach for repeated cross-sectional data. Multilevel data (Goldstein, 2010; Skrondal and Rabe-Hesketh, 2004) consist of units of analysis of diﬀerent type that are hierarchically nested. At the lowest level such units can be described by some variables, and they are also grouped into larger units, which, in turn, could be described by other variables. The general speciﬁcation of multilevel models allows a large variety of applications. In particular, repeated measures data can be seen as a speciﬁc case of multilevel data with occasions i at level-1 and units j at level 2 (Maas and Snijders, 2003). The dependence among level-1 errors that characterizes panel data can be handled by including correlation structures at level-1 (Goldstein, 2010). Moreover, it is possible to allow heteroscedastic within-group errors through variance functions (Davidian and Giltinan, 1995). This ﬂexibility in the speciﬁcation of covariance structures represents an important feature of mixed-eﬀect models for longitudinal data. Diﬀerently from longitudinal data, repeated cross-sectional data consist of observations on individual survey respondents drawn from the same context (e.g. the same country) at many diﬀerent time-points, and can therefore be treated as clustered within time-points (Firebaugh, 1997). Being a new sample each time point, this collection of data does not allow to follow speciﬁc individuals over time but allow to catch social changes. DiPrete and Grusky (1990) were the ﬁrst to adopt a multilevel framework for repeated cross-sectional data. The key diﬀerence of this model with the traditional multilevel framework is the possibility of modelling the time eﬀects by allowing for serial correlation among level-2 units (time-points). The authors took into account this case by deriving generalized least-square estimators. A similar idea has been considered by Browne and Goldstein (2010) in a context of spatial correlation in a Bayesian framework. The independence assumption among level-2 disturbances is relaxed and the spatial correlation between pairs of clusters is modelled through an explicit function of the distance between them. However, to our knowledge, the analysis of repeated cross sectional data in a multilevel framework is not well established both from the theoretical and the practical point of view. In fact, most speciﬁcations are ad hoc solutions with no available software so that such models result poorly developed and rarely applied. * Presented at the ﬁrst meeting of the FIRB (“Futuro in ricerca” 2012) project “Mixture and latent variable models for causal-inference and analysis of socio-economic data”, Perugia (IT), March 15-16, 2013 1