Personal health indexing based on medical examinations: A data mining approach Ling Chen a, , Xue Li a , Yi Yang f , Hanna Kurniawati a , Quan Z. Sheng b , Hsiao-Yun Hu c,e , Nicole Huang d,c a School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia b School of Computer Science, The University of Adelaide, Adelaide, Australia c Department of Education and Research, Taipei City Hospital, Taipei, Taiwan d Institute of Hospital and Health Care Administration, National Yang-Ming University, Taipei, Taiwan e Institute of Public Health and Department of Public Health, National Yang-Ming University, Taipei, Taiwan f The Centre for Quantum Computation & Intelligent Systems, University of Technology Sydney, Australia abstract article info Article history: Received 5 February 2015 Received in revised form 23 June 2015 Accepted 29 October 2015 Available online 6 November 2015 Keywords: Personal health index Geriatric medical examination Label uncertainty Data mining Feature extraction We design a method called MyPHI that predicts personal health index (PHI), a new evidence-based health indi- cator to explore the underlying patterns of a large collection of geriatric medical examination (GME) records using data mining techniques. We dene PHI as a vector of scores, each reecting the health risk in a particular disease category. The PHI prediction is formulated as an optimization problem that nds the optimal soft labels as health scores based on medical records that are infrequent, incomplete, and sparse. Our method is compared with classication models commonly used in medical applications. The experimental evaluation has demonstrat- ed the effectiveness of our method based on a real-world GME data set collected from 102,258 participants. © 2015 Elsevier B.V. All rights reserved. 1. Introduction Modern societies have experienced dramatic growth in elderly popu- lation from the beginning of this century. This implies increasing healthcare needs and government expenditure. For example, the U.S. gov- ernment spent $414.3 billion in elderly health care in 2011, $100 billion higher than the ination-adjusted expenses in 2001 [1]. Annual geriatric medical examination (GME) is now an integral part of elderly healthcare for many developed countries. For instance, Australia [2], United Kingdom [3], and Taiwan [4] have GME programs to periodically monitor health status of senior residents. However, it is always a difcult task for healthcare professionals to provide an overall report on personal health after a comprehensive medical check-up is performed with hundreds of parameters. Moreover, the richness of GME records, such as correla- tions amongst test results, their longitudinal progression, and their rela- tionships to other participants that have similar patterns of health development, is often left unexplored. In fact, such exploration is manual- ly impossible, because the complexity of the combined effects grows exponentially with the growth of the number of different test results, the available number of longitudinal records, and the total number of participants. We design a method called MyPHI that predicts personal health index (PHI), a new health indicator to explore underlying patterns of a large collection of GME records using data mining techniques. We de- ne PHI as a vector of scores, each of which is a compliment probability dened based on the health-related risks associated with a particular disease category. Since the highest health risk is health-related death, we explore the health-related main Cause of Death (COD) information linked to the GME participants. Based on this denition, the higher the scores, the healthier the person. It is our belief that medical decision support systems are used to support clinical professionals rather than to replace them. So the primary goal of the proposed MyPHI is to draw their attentions to participants with high risks. To the best of our knowledge, this work is the rst of this kind in predicting personal health scores by mining large medical examination data. PHI provides an important benchmark for understanding health sta- tus of the elderly people. Particularly, the following parties can be benet- ed by PHI: Governments: Public health policies are often made and revised based on scientic evidence from statistical analysis and research outputs [5]. For example, community health index can help the understanding of regional health status [6]. Public health authorities can use PHI to gauge their decisions on population health policies by utilizing the ag- gregated PHI of individuals. Particularly, the impact of a policy on Decision Support Systems 81 (2016) 5465 Corresponding author. E-mail addresses: l.chen5@uq.edu.au (L. Chen), xueli@itee.uq.edu.au (X. Li), yi.yang@uts.edu.au (Y. Yang), hannakur@uq.edu.au (H. Kurniawati), qsheng@cs.adelaide.edu.au (Q.Z. Sheng), A3547@tpech.gov.tw (H.-Y. Hu), syhuang@ym.edu.tw (N. Huang). http://dx.doi.org/10.1016/j.dss.2015.10.008 0167-9236/© 2015 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss