Toward Personalized Care Management of Patients at Risk - The Diabetes Case Study Hani Neuvirth, Michal Ozery-Flato, Jonathan Laserson, Michal Rosen-Zvi Machine Learning and Data Mining group IBM Research Mount Carmel, Haifa, 31905, Israel {hani, ozery, ljon, rosen}@il.ibm.com Jianying Hu, Martin S. Kohn, Shahram Ebadollahi Healthcare Transformation group IBM Research IBM T.J. Watson Research, Hawthorne, NY {jyhu, marty.kohn, ebad}@us.ibm.com ABSTRACT Chronic diseases constitute the leading cause of mortality in the western world, have a major impact on the patients' quality of life, and comprise the bulk of healthcare costs. Nowadays, healthcare data management systems integrate large amounts of medical information on patients, including diagnoses, medical procedures, lab test results, and more. Sophisticated analysis methods are needed for utilizing these data to assist in patient management and to enhance treatment quality at reduced costs. In this study, we take a first step towards better disease management of diabetic patients by applying state-of-the art methods to anticipate the patient’s future health condition and to identify patients at high risk. Two relevant outcome measures are explored: the need for emergency care services and the probability of the treatment producing a sub-optimal result, as defined by domain experts. By identifying the high-risk patients our prediction system can be used by healthcare providers to prepare both financially and logistically for the patient needs. To demonstrate a potential downstream application for the identified high-risk patients, we explore the association between the physician treating these patients and the treatment outcome, and propose a system that can assist healthcare providers in optimizing the match between a patient and a physician. Our work formulates the problem and examines the performance of several learning models on data from several thousands of patients. We further describe a pilot system built on the results of this analysis. We show that the risk for the two considered outcomes can be evaluated from patients’ characteristics and that features of the patient-physician match improve the prediction accuracy for the treatment’s success. These results suggest that personalized medicine can be valuable for high risk patients and raise interesting questions for future improvements. Categories and Subject Descriptors G.3 [Mathematics of Computing]: Probability and Statistics – Survival analysis; I.2.6 [Computing Methodologies]: Artificial Intelligence – Learning; J.3 [Computer Applications]: Life and Medical Sciences – health, medical information systems. General Terms: Algorithms, Measurement, Performance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’11, August 21–24, 2011, San Diego, California, USA. Copyright 2011 ACM 978-1-4503-0813-7/11/08...$10.00. 1. INTRODUCTION Recent advances in adoption of information technology in healthcare organizations have made huge volumes of patient data available. Analysis of this data can uncover patterns for best practices and insight, which can potentially improve care delivery and make medical practices more effective. The measure of goodness of the care delivery process is ultimately obtaining optimal outcomes for the patients while reducing the costs of patient care in the long run. Chronic diseases are known to consume the greatest majority of healthcare expenditures [4]. Chronic illness extends over many years and is often associated with acute exacerbations and progressive deterioration. One of the goals for healthcare policy is to improve chronic illness treatment and minimize the exacerbations, slow or halt deterioration, and decrease the cost of care. Diabetes is a chronic disease whose incidence is increasing and appearing in progressively younger patients. Moreover, diabetes mellitus is notoriously known for the serious complications associated with long disease duration. Thus, more patients will be dealing with diabetes and its complications for longer periods of time, making it important to improve and evaluate the quality of healthcare provided to diabetics. An important step towards an improved diabetes treatment is to define a quantitative evaluation measure of the disease status. In this study we focus on two common methods for evaluating diabetes: blood test results and urgent care visits. The HbA1c blood test (also called glycohemoglobin) is a reliable indicator of long term diabetes management. The higher the HbA1c measure, the higher the risk of developing complications such as eye, heart, or kidney disease, nerve damage or stoke. Poorly managed diabetics will need urgent care more often and thus show an increase in the number of emergency department and urgent care visits. The number of these urgent care (UC) events can also serve as a way to measure the status of the disease. Measurements for the time until an UC event occurs and the HbA1c lab test results can be considered labels or outcome to be predicted in a machine learning or a statistical analysis settings. The estimated probability for these outcomes can help assess the risk for disease aggravation. Determining health risks in the context of particular chronic diseases is very common in the medical literature. The seminal work of Cox in the early seventies [5] provides the most common method for prediction of disease risk and identification of the associated factors. One of the classic examples for the utility of disease risk assessment is cardiovascular disease (CVD). Assessing the risk for CVD has become a major strategy for preventing this disease and its unfavorable consequences. Over the past two decades many 395