Vol.:(0123456789) 1 3 Journal of Cancer Research and Clinical Oncology https://doi.org/10.1007/s00432-019-03102-y ORIGINAL ARTICLE – CLINICAL ONCOLOGY Decision tree algorithm in locally advanced rectal cancer: an example of over‑interpretation and misuse of a machine learning approach Francesca De Felice 1  · D. Crocetti 2  · M. Parisi 1  · V. Maiuri 1  · E. Moscarelli 1  · R. Caiazzo 1  · N. Bulzonetti 1  · D. Musio 1  · V. Tombolini 1 Received: 11 November 2019 / Accepted: 26 November 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019 Abstract Purpose To analyse the classification performances of a decision tree method applied to predictor variables in survival outcome in patients with locally advanced rectal cancer (LARC). The aim was to offer a critical analysis to better apply tree-based approach in clinical practice and improve its interpretation. Materials and methods Data concerning patients with histological proven LARC between 2007 and 2014 were reviewed. All patients were treated with trimodality approach with a curative intent. The Kaplan–Meier method was used to estimate overall survival (OS). Decision tree methods were was used to select important variables in outcome prediction. Results A total of 100 patients were included. The 5-year and 7-year OS rates were 76.4% and 71.3%, respectively. Age, co- morbidities, tumor size, clinical tumor classification (cT) and clinical nodes classification (cN) were the important predictor variables to the tree’s construction. Overall, 13 distinct groups of patients were defined. Patients aged < 65 years with cT3 disease and elderly patients with a tumor size < 5 cm seemed to have highest rates of survival. But the process over-fitted the data, leading to poor algorithm performance. Conclusion We proposed a decision tree algorithm to identify known and new pre-treatment clinical predictors of survival in LARC. Our analysis confirmed that tree-based machine learning method, especially classification trees, can be easily interpreted even by a non-expert in the field, but controlling cross validation errors is mandatory to capture its statistical power. However, it is necessary to carefully analyze the classification error trend to chose the important predictor variables, especially in little data. Machine learning approach should be considered the new unexplored frontier in LARC. Based on big datasets, decision trees represent an opportunity to improve decision-making process in clinical practice. Keywords Machine learning · Decision tree · Big data · Rectal cancer · Chemoradiotherapy · Surgery · Survival Introduction Trimodality approach, including a combination of chemora- diotherapy (CRT), total mesorectal excision (TME) surgery and chemotherapy (CHT), is a standard of care in locally advanced rectal cancer (LARC) management (National Comprehensive Cancer Network Guidelines 2019). Tra- ditionally, treatment algorithm is based on clinical and pathological prognostic stage categories. In rectal cancer, prognostic groups are built by defining tumor (T), nodes (N) and metastasis (M) extension, without non-anatomic factor supplementation (National Comprehensive Cancer Network Guidelines 2019). Thus, a more prudent approach to identify the LARC patients who are predicted to have worse prognosis prior to starting treatment would be use- ful. These patients would theoretically gain the most benefit from intensified neo-adjuvant CRT (De Felice et al. 2017). This study aimed to develop a classification tree approach to predict survival in LARC patients using pre-treatment clinical parameters. At present, the utility of the decision tree in LARC has yet to be investigated. In the light of the emerging tree-based methods application in the oncologic field, we discuss a real-life dataset to provide an illustration of its practical relevance. * Francesca De Felice fradefelice@hotmail.it 1 Department of Radiotherapy, Policlinico Umberto I, “Sapienza” University of Rome, Viale Regina Elena 326, 00161 Rome, Italy 2 Department of Surgery “Pietro Valdoni”, Policlinico Umberto I, “Sapienza” University of Rome, Rome, Italy