Supplementary Materials for DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks Changhee Lee, 1 William R. Zame, 2 Jinsung Yoon, 1 Mihaela van der Schaar 3, 1, 4 1 Department of Electrical and Computer Engineering, University of California, Los Angeles, USA 2 Department of Economics, University of California, Los Angeles, USA 3 Department of Engineering Science, University of Oxford, UK 4 Alan Turing Institute, London, UK chl8856@ucla.edu, zame@econ.ucla.edu, jsyoon0823@ucla.edu, mihaela.vanderschaar@oxford-man.ox.ac.uk Description on MP-based Survival Models For the comparison of DeepHit with conventional machine learning algorithms, we modified survival data to perform survival analysis based on the mortality prediction using ma- chine learning algorithms. For every time interval m where m =0, ··· ,T max , the cause-specific data is updated with a new dataset of patients who are not censored nor died from other causes until the m-th time interval. Now, let’s focus on modifying the survival data for the event k, which can be easily generalized to other causes. The number of patients who are not censored nor died from other causes (i.e. other than cause k) until the m-th time interval is denoted as N k m . Then, a new label, ˜ l (i) k , which indicates whether the i-th patient is dead ( ˜ l (i) k = 1) or alive ( ˜ l (i) k = 0) at the m-th time interval, is assigned for every patient. Using the updated dataset ˜ D k m = {x (i) , ˜ l (i) k } N k m i=1 , we train conventional machine learning (ML) algorithms (e.g. random forest, logistic regression and AdaBoost) in order to predict the new label. From this, it is possible to obtain ML classifiers independently trained at every time interval. Then, the risk score of a patient at each time interval can be assessed by using ML classifiers trained at the corresponding time. The pseudo-code for training the ML-based survival models for event cause k is described in Algorithm 1. Additional Results for Discriminative Performance In this section, we provide additional results on the perfor- mance benefits in terms of cause-specific C td -index, compar- ing with the cause-specific version of survival models for the SEER and the SYNTHETIC datasets, respectively in Table 1 and 2. For the SEER dataset, DeepHit provided consistent perfor- mance improvements over conventional benchmarks where the improvements were statistically significant (p< 0.05 and p< 0.001) except for cs-MP-AdaBoost and cs-MP-LogitR in CVD prognosis, and statistically significant (p< 0.05 and often p< 0.001) for all benchmarks in breast cancer prog- nosis. For the SYNTHET dataset, DeepHit outperformed all the benchmarks and the performance improvements were Copyright c 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Algorithm 1 Pseudo code for ML-based survival model for event cause k Initialize: ˜ D k m = φ for m =0, ··· ,T max for m =0, ··· ,T max do for i =1, ··· ,N do if k (i) == k & s (i) m then ˜ l (i) k 1 ˜ D k m ˜ D k m + {x (i) , ˜ l (i) k } else if s (i) >m then ˜ l (i) k 0 ˜ D k m ˜ D k m + {x (i) , ˜ l (i) k } end if end for Train ML predictor H (m) ML,k (x) with ˜ D k m = {x (i) , ˜ l (i) k } N k m i=1 end for all statistically significant (p< 0.001) for both Event 1 and Event 2. Table 1: Additional comparison of cause-specific C td -index performance (mean and 95% confidence interval) tested on the SEER dataset Algorithms CVD Breast Cancer cs-RSF 0.280 (0.262 - 0.298) 0.584 (0.574 - 0.594) cs-ThresReg 0.664 (0.657 - 0.671) 0.645 (0.628 - 0.662) cs-MP-RForest 0.281 (0.263 - 0.299) 0.584 (0.574 - 0.594) cs-MP-AdaBoost 0.671 (0.665 - 0.677) 0.741 (0.735 - 0.747) cs-MP-LogitR 0.665 (0.645 - 0.685) 0.657 (0.648 - 0.666) DeepHit 0.684 (0.674 - 0.694) 0.752 (0.748 - 0.756) * indicates p-value < 0.001 indicates p-value < 0.05