Multiple imputation in multiple classification and multiple-membership structures Recai M. Yucel 1 , Hong Ding 2 , Ali Kerem Uludag 2 and Donald Tomaskovic-Devey 3 Department of Epidemiology and Biostatistics, University at Albany, SUNY 1 Division of Biostatistics and Epidemiology, University of Massachusetts, Amherst 2 Department of Sociology, University of Massachusetts, Amherst 3 Abstract In data systems with complexities due to nested/non- nested clustering and multiple-membership, missing val- ues present an added analytic challenge to the statistical analyses. We develop model-based multiple imputation (MI) inference which has been a popular method in the analyses of missing data. Adaptations of multivariate generalizations of the mixed-effects models are used as imputation model. These models are modified to handle multivariate responses and observational units with possi- bly overlapping membership of clusters that are not nec- essarily hierarchical. Markov Chain Monte Carlo tech- niques are used to simulate and draw imputations from underlying joint posterior predictive distributions. Brief discussion on handling mixture of variable types and cali- bration techniques for post-imputation checks will be pro- vided. Relevant concepts on both multiple-membership and non-nested clustering are demonstrated longitudinal administrative data with panel missingness as well as ar- bitrary item nonresponse. KEY WORDS: Multiple imputation, Bayesian inference, missing data, multiple membership, mixed-effects 1. Introduction Principled missing-data techniques especially those using the multiple-imputation (MI) paradigm (Rubin, 1976) have developed significantly since 1980s. Most of these techniques rely on relatively straightforward model as- sumptions such as independent and identically dis- tributed units or clustered data. These methods are available to practitioners in software packages such as SAS PROC MI (SAS Institute 2001) (for cross-sectional data) and R package pan (Schafer and Yucel 2002), Ml- wiN mimacro (Carpenter and Kenward 2008) (for mul- tilevel data). Building on these well-established meth- ods, we develop model-based MI techniques for analyz- ing clustered incomplete data with multiple membership and non-nestedness. Our strategy jointly models vari- ables subject to missing values in such settings leading to multivariate extension of a multiple membership and mul- tiple classification model as first suggested by Browne, Goldstein, and Rasbash (2001). Below we describe the example that motivated this research and we believe it is useful to illustrate multiple membership as well as mul- tiple classification problem. 1.1 Motivating Example Since 1966 the U.S. Equal Employment Opportunity Commission (EEOC) has been collecting yearly work- place surveys describing outcomes on equal employment opportunity (EEO) . Private sector firms with more than 50 employees (25 if federal contractors), are required to submit yearly reports on the race/ethnic and sex com- position of their work force in each establishment with 25 or more employees, about 696691 across US. These re- ports contain establishment employment counts of sex by five race/ethnic groups (White, Black, Hispanic, Asian/ Pacific Islander, American Indian/ Alaskan Native) dis- tributed across nine occupational categories (officials and managers, professionals, technicians, sales workers, office and clerical workers, craft workers, operatives, laborers, and service workers). These reports also include infor- mation on the establishments parent company, industry, and geographic location. Each record states whether or not the parent company is a federal contractor. Unit of analysis in the substantive analyses is defined to be an establishment. Each establishment has repeated observations over time. At any one point in time estab- lishments are nested within firms. Firms that are fed- eral contractors are required to practice affirmative ac- tion. We observe federal contractor status as a firm char- acteristic. Establishments are also nested within indus- tries. Industries provide normative models of appropriate workplace organization. Industries with more diversity in group representation may encourage managerial inte- gration at the workplace level. We observe the propor- tion of status group representation in total antd man- agerial industry employment. Establishments are also nested within spatial contexts. The local labor market from which labor is drawn influences the ability to hire from various status groups. For each outcome variable we observe that groups proportional representation in the local labor market. A second spatial context is the state an establishment is found within. States represent a political context that may influence workplace behav- ior. Prior research suggests that as the percent minority in states increase discrimination in various institutions (education, law, voting, as well as employment) increase as well. Other research suggests that unions were strong supporters of civil rights law. We observe percent black, Hispanic, and unionized at the state level to model their influence on state as political context. Figure 1 depicts this complicated structure of nesting. Establishments can also shift industries and firms over Section on Bayesian Statistical Science – JSM 2008 4006