Visualising a State-wide Patient Data Collection: A Case Study to Expand the Audience for Healthcare Data Wei Luo 1 Marcus Gallagher 1 Di O’Kane 2 Jason Connor 3 Mark Dooris 4 Col Roberts 2 Lachlan Mortimer 2 Janet Wiles 1 1 School of Information Technology & Electrical Engineering The University of Queensland Brisbane, Australia Email: {luo, marcusg, wiles}@itee.uq.edu.au 2 Clinical Practice Improvement Centre Centre for Healthcare Improvement Queensland Health Brisbane, Australia 3 Department of Psychiatry The University of Queensland Brisbane, Australia 4 Department of Cardiology Royal Brisbane and Women’s Hospital Brisbane, Australia Abstract This paper describes the application of existing and novel adaptations of visualisation techniques to rou- tinely collected health data. The aim of this case study is to examine the capacity for visualisation ap- proaches to quickly and effectively inform clinical, policy, and fiscal decision making to improve health- care provision. We demonstrate the use of interac- tive graphics, fluctuation plots, mosaic plots, time plots, heatmaps, and disease maps to visualise pa- tient admission, transfer, in-hospital mortality, mor- bidity coding, execution of diagnosis and treatment guidelines, and the temporal and spatial variations of diseases. The relative effectiveness of these techniques and associated challenges are discussed. Keywords: Visualisation, Exploratory Data Analysis, Routine Data Collection 1 Introduction The state of Queensland has the third largest popu- lation in Australia [20]. In the financial year 2006- 2007, public hospitals in Queensland treated more than 780,000 inpatients [11, page 11-12]. All such inpatient encounters are routinely collected in the Queensland Hospital Admitted Patient Data Collec- tion (QHAPDC) [4]. This centralized database setup represents an invaluable resource for knowledge dis- covery and evidence based medicine. Since 2005, the health department of the state government, Queens- land Health, has implemented a series of initiatives to improve performance monitoring and governance. As an example, the VLAD (Variable Life Adjusted Dis- play) system is in place to detect extraordinary trends and occurrences [1], using data from QHAPDC. Sig- nificant challenges exist in developing efficient and ef- Copyright 2010, Australian Computer Society, Inc. This pa- per appeared at the Australasian Workshop on Health Infor- matics and Knowledge Management (HIKM 2010), Brisbane, Australia. Conferences in Research and Practice in Informa- tion Technology (CRPIT), Vol. 108. Anthony Maeder and David Hansen, Eds. Reproduction for academic, not-for-profit purposes permitted provided this text is included. fective ways of maximizing the utility of this data resource. A visualisation toolkit tailored for health data such as QHAPDC is likely to have significant benefits to both Queensland Health and the broader medical community. This article describes a step to- wards developing such a toolkit. In the following sections, we describe various tech- niques used to visualise QHAPDC data. Our visuali- sation is exploratory in nature, with the overall aim of expanding the audience for healthcare data, and the following specific aims guiding the selection of tech- niques: 1. To assess data quality, and hence to identify po- tential improvements to the data collection pro- cess. (See Section 3.1 for an example where cod- ing issues were identified through a simple his- togram.) 2. To detect anomalies (both positive and negative) in clinical practices, and hence to promote clin- ical practice improvement. (See Section 3.2 for such an attempt.) 3. To identify temporal trends and spatial variation in the data for better allocation of health care resources. (See Section 3.5 and 3.6.) 4. To identify the potential research value of the routinely collected data; to generate medical hy- potheses that lead to further research projects. (See Section 3.4.) Visualisation of public health data has been demonstrated to enhance knowledge and support de- cision making (see for example [12, 22, 23, 24]). A unique challenge of the state-wide QHAPDC database is that it is essentially a repository for a number of largely independently generated data col- lection sites, typically different hospital campuses. While this provides comprehensive and rich data for visualisation techniques, there are key challenges of “noisy” and missing data associated with possible non-uniformity of data coding practices across hos- pital sites. This article is organised as follows: Section 2 de- scribes the dataset to be used and briefly explains