Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. A DATA-INTEGRATED NURSE ACTIVITY SIMULATION MODEL Durai Sundaramoorthi Victoria C. P. Chen Seoung B. Kim Jay M. Rosenberger Department of Industrial and Manufacturing Systems Engineering The University of Texas at Arlington Arlington, TX 76019, U.S.A. Deborah F. Buckley-Behan School of Nursing The University of Texas at Arlington Arlington, TX 76019, U.S.A. ABSTRACT This research develops a data-integrated approach for con- structing simulation models based on a real data set provided by Baylor Regional Medical Center (Baylor) in Grapevine, Texas. Tree-based models and kernel density estimation were utilized to extract important knowledge from the data for the simulation. Classification and Regression Tree model, a data mining tool for prediction and classification, was used to develop two tree structures: (a) a regression tree, from which the amount of time a nurse spends in a location is predicted based on factors, such as the primary diagnosis of a patient and the type of nurse; and (b)a classification tree, from which transition probabilities for nurse movements are determined. Kernel density estima- tion is used to estimate the continuous distribution for the amount of time a nurse spends in a location. Merits of using our approach for Baylor’s nurse activity simulation are discussed. 1 INTRODUCTION In traditional stochastic simulation models, transition prob- abilities are obtained either subjectively or by looking at all possible combinations of the levels of the simulation state variables. If the system under consideration is com- plex, such as nurse movement, then a subjective approach is unlikely to be accurate, and an approach using all pos- sible combinations of the states will be impractical. In the past, in order to reduce the number of simulation vari- ables, factorial designs and screening methods were used (Bettonvil and Kleijnen 1997; Cheng 1997; Shen and Wan 2005). Even after eliminating some of the variables, a few remaining variables could lead to a huge number of com- binations for the simulation. For instance, six categorical variables with ten categories each, will lead to a million possible states in the simulation. Obtaining accurate transi- tion probabilities for such a huge simulation model is still difficult. In this paper, using the Baylor data, we present a new methodology to reduce the number of combinations and find transition probabilities for stochastic simulation models. Kernel density estimates and trees were utilized to extract important knowledge about the workload of nurses from an encrypted data set provided by Baylor for four care units. The four units include two medical/surgical units, one mom/baby unit, and one high-risk labor-and-delivery unit. Classification and Regression Trees, a data mining tool for prediction and classification, was applied to the Baylor data to develop two tree structures: (a) a regression tree, from which the amount of time a nurse spends in a location is predicted based on factors, such as the primary diagnosis of a patient and the type of nurse; and (b) a classification tree, from which transition probabilities for nurse movements are determined. This research develops a simulation model for nurse activity which could be used to evaluate nurse-patient assign- ments. In the literature, most of the relevant research focuses only on nurse budgeting and nurse scheduling methodolo- gies (Aickelin and Dowsland 2003; Burke et al. 2001; Jaumard et al. 1998; Kirkby 1997; Miller et al. 1976; Warner 1976) and ignores uncertainty. By contrast, our research seeks an integrated statistical data mining and sim- ulation optimization approach that utilizes patterns in the real data to balance workload. The integration of statistical modeling and optimization has been found to work well for some complex problems (Cervellera et al. 2003; Chen 960 1-4244-0501-7/06/$20.00 ©2006 IEEE