Beyond The Cox Model: Artificial Neural Networks For Survival Analysis Part II Rashmi Joshi*, Colin Reeves § *Formerly Control Theory and Applications Centre, Coventry University, Priory Street, Coventry, U.K. § Faculty of Engineering and Computing, Coventry University, Priory Street, Coventry, U.K. Tel: +44(0)7956 157094 E-mail: Rashmi.Joshi@hotmail.co.uk, C.Reeves@coventry.ac.uk Keywords: Artificial Neural Networks (ANNs), survival, non-linear, malignant melanoma, confidence intervals Abstract Artificial neural networks (ANNs) are proving popular and successful in a wide variety of medical applications and for non-linear regression and classification. It has been previously shown that a novel flexible and non-linear ANN model for prognosis and prediction of conditional survival probabilities was developed and successfully applied to censored data [1]. Building on this, we expand the initial probabilistic model and this paper details results with refinements such as enhancements of generalisation capability, in order to address issues such as model complexity and topographical structure. The model is trained using a maximum likelihood approach. An asymptotic approximation to the variance-covariance matrix using the Fisher Information Matrix is discussed, and provides standard errors on the parameter estimates. In addition to the prediction of conditional survival probabilities, hazard, and probability density functions, confidence intervals on the survival estimates are estimated using the Choleski decomposition algorithm and a quasi- bootstrap approach, and are shown. Thus the model’s predictive accuracy is further confirmed. The ANN’s performance is compared to other popular traditional survival modelling techniques. We conclude that the ANN model’s predictive accuracy is at the very least as good as that of a heavily used leading statistical model, the Cox model, but that it is advantageous as a flexible general hazards model when analysing survival data where a specified distributional form or model assumptions are difficult to justify. The proposed ANN model therefore extends the range of data that can now be analyzed using survival analysis methods, and is a candidate for use in the analysis of censored survival data. 1 Introduction The field of ANN research experienced some growth in the 1950s, when one of the earliest models, known as the perceptron, was developed [2]. Indeed the growth in neural computing techniques is widely recognized as they have found applications in a variety of fields, such as medical, for example, for use in clinical diagnosis and analysis [3, 4, 5], and in some cases have given results that match or surpass those obtained from statistical models [6]. The analysis of time-to-event data i.e. data concerned with the time from a defined time origin until the occurrence of a particular event of interest is termed survival analysis. The field of survival analysis has experienced tremendous growth during the latter half of the 20th century and of primary interest in this field is the investigation of the functional relationship between covariates, such as treatment or subject characteristics (possible risk factors), and the time to occurrence of an event such as death, disease recurrence or cure. By identifying factors of prognostic significance for a particular disease, valuable information may be utilised in an important area of medical statistics, for instance, predictions of survival characteristics of a particular disease have major implications on patient management and care strategies. The survivor and hazard functions are estimated from the observed survival times and are of main interest when analysing survival data. The probability density function of t, the actual survival time of an individual, is f(t). The survivor function, S(t), is the probability the survival time is greater than or equal to t. The related hazard function h(t) denotes the instantaneous death rate and represents the probability that the event occurs at time t, conditional on it not occurring prior to time t. The following relationship holds: h(t) dt = ) ( ) ( t S dt t f (1) The cumulative hazard function H(t) is defined as  = t du u h t H 0 ) ( ) ( (2) so that H(t) e S(t) - = (3) A fundamental characteristic of survival data is that survival times are frequently censored (i.e. the end point of interest does not occur, for instance because the case has been lost to follow up). Right-censored cases have survival times that are greater than some defined time point. The data used in this study contains right-censored survival times. The semi-parametric Cox’s proportional hazards model (Cox PH) is a popular choice in the analysis of censored survival data [7], in addition to parametric models [8]. However, both impose either distributional forms or assumptions that are not always justifiable, for instance a