A multilevel approach for repeated cross-sectional data * Lucia Modugno Department of Statistical Sciences University of Bologna (IT) e-mail: lucia.modugno@unibo.it Paola Monari Department of Statistical Sciences University of Bologna (IT) e-mail: paola.monari@unibo.it Simone Giannerini Department of Statistical Sciences University of Bologna (IT) e-mail: simone.giannerini@unibo.it Silvia Cagnone Department of Statistical Sciences University of Bologna (IT) e-mail: silvia.cagnone@unibo.it 1 Abstract The main aim of this work is to propose a multilevel approach for repeated cross-sectional data. Multilevel data (Goldstein, 2010; Skrondal and Rabe-Hesketh, 2004) consist of units of analysis of different type that are hierarchically nested. At the lowest level such units can be described by some variables, and they are also grouped into larger units, which, in turn, could be described by other variables. The general specification of multilevel models allows a large variety of applications. In particular, repeated measures data can be seen as a specific case of multilevel data with occasions i at level-1 and units j at level 2 (Maas and Snijders, 2003). The dependence among level-1 errors that characterizes panel data can be handled by including correlation structures at level-1 (Goldstein, 2010). Moreover, it is possible to allow heteroscedastic within-group errors through variance functions (Davidian and Giltinan, 1995). This flexibility in the specification of covariance structures represents an important feature of mixed-effect models for longitudinal data. Differently from longitudinal data, repeated cross-sectional data consist of observations on individual survey respondents drawn from the same context (e.g. the same country) at many different time-points, and can therefore be treated as clustered within time-points (Firebaugh, 1997). Being a new sample each time point, this collection of data does not allow to follow specific individuals over time but allow to catch social changes. DiPrete and Grusky (1990) were the first to adopt a multilevel framework for repeated cross-sectional data. The key difference of this model with the traditional multilevel framework is the possibility of modelling the time effects by allowing for serial correlation among level-2 units (time-points). The authors took into account this case by deriving generalized least-square estimators. A similar idea has been considered by Browne and Goldstein (2010) in a context of spatial correlation in a Bayesian framework. The independence assumption among level-2 disturbances is relaxed and the spatial correlation between pairs of clusters is modelled through an explicit function of the distance between them. However, to our knowledge, the analysis of repeated cross sectional data in a multilevel framework is not well established both from the theoretical and the practical point of view. In fact, most specifications are ad hoc solutions with no available software so that such models result poorly developed and rarely applied. * Presented at the first meeting of the FIRB (“Futuro in ricerca” 2012) project “Mixture and latent variable models for causal-inference and analysis of socio-economic data”, Perugia (IT), March 15-16, 2013 1