A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival Ali Dag a , Kazim Topuz b , Asil Oztekin c , Serkan Bulur d , Fadel M. Megahed a, ⁎ a Department of Industrial and Systems Engineering, Auburn University, AL 36849, USA b Department of Industrial and Manufacturing Engineering, Wichita State University, KS 67260, USA c Manning School of Business, University of Massachusetts at Lowell, MA 01854, USA d Department of Cardiology, Istanbul Medeniyet University, Istanbul, Turkey abstract article info Article history: Received 14 March 2015 Received in revised form 19 February 2016 Accepted 20 February 2016 Available online xxxx Recent research has shown that data mining models can accurately predict the outcome of a heart transplant based on predictors that include patient and donor's health/demographics. These models have not been adopted in prac- tice, however, since they did not: a) consider the interactions between the explanatory variables; b) provide a patient's speciﬁc risk of survival (reported results have been primarily deterministic); and c) offer an automated decision tool that can provide some data-driven insights to practitioners. In this study, we attempt to overcome these three limitations through the use of Bayesian Belief Networks (BBN). The proposed BBN framework is com- prised of four phases. In the ﬁrst two phases, the data is preprocessed, and a candidate set of predictors is generated based on employing several variable selection methods. The third phase involves the addition of medically relevant variables to the list. In phase four, the BBN model is applied. The results show that the proposed BBN method pro- vides similar predictive performance to the best approaches in the literature. More importantly, our method pro- vides novel information on the interactions among the predictors and the conditional probability of survival for a given set of relevant donor–recipient characteristics. We offer U.S. practitioners a decision support tool that pre- sents an individualized survival score based on our BBN model (and the UNOS dataset). © 2016 Elsevier B.V. All rights reserved. Keywords: Healthcare analytics Bayesian Belief Networks Medical decision making Data mining Genetic algorithms United Network for Organ Sharing (UNOS) 1. Introduction Heart failure is a serious medical condition, where a patient's heart is weakened and cannot pump enough blood to meet the body's demands [1]. This condition affects an estimated 2–3% of the world's adult popu- lation [2]. In the U.S., there are over 5.8 million patients living with heart failure, with an estimated annual incidence rate of 550,000 [1,3]. The majority of these patients can enjoy a full life by managing the condition with medication. However, a certain class of heart failure (end-stage heart failure) cannot be managed with these interventions and can only be overcome by a heart transplant. If a patient is deemed eligible for a transplant, then she/he is placed on a waiting list until a suitable donor heart is found [4]. Currently, in the U.S. there are about 3000 people on waiting lists for a heart transplant at any one time, while there are only about 2000 donor hearts available each year [4]. This gap between supply and demand of donated healthy hearts leads to longer waiting times and thus leaves many to die while waiting for a transplant [5]. The current matching process is determined based on a printed out list from the United Network for Organ Sharing (UNOS) computers, which is based on “blood type, body size, UNOS status, and length of time on the waiting list” [6]. There has been a signiﬁcant amount of research being done to determine the subset of variables that should be included for matching. Much of this work involve data mining techniques since they do not require prior knowledge about the data, nor do they make assumptions about the statistical distribution or properties of the data [7]. In particular, data mining methods have shown great accuracy in determining which subset of variables inﬂuence a patient's survival over a pre-speciﬁed time period [8–11]. There is extensive research on using data-driven models to predict post transplantation survival time. For any type of transplant, we can classify these models into two streams. The ﬁrst stream addresses the question of how to accurately predict post transplantation survival for a given time period (i.e. will the patient survive for X amount of years?). In our analysis of the literature, this represents the majority of the work. This question has been addressed for virtually all organ trans- plants; for example, see the following papers in heart [12–21], kidney [9,10,22], and liver [23]. It is important to note that these models are de- terministic, i.e., they provide an expected value that is typically a binary survival (after X-years) post-transplantation outcome. On the other hand, the second stream attempts to understand the uncertainty in the prediction as well as identify the conditional dependencies among Decision Support Systems xxx (2016) xxx–xxx ⁎ Corresponding author at: 3301L Shelby Center, Auburn University, AL 36849, USA. Tel.: +1 334 844 8273; fax: +1 334 844 1381. E-mail address: fmegahed@auburn.edu (F.M. Megahed). URL: http://www.fadelmegahed.com (F.M. Megahed). DECSUP-12689; No of Pages 12 http://dx.doi.org/10.1016/j.dss.2016.02.007 0167-9236/© 2016 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Please cite this article as: A. Dag, et al., A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival, Decision Support Systems (2016), http://dx.doi.org/10.1016/j.dss.2016.02.007