Citation: Tran, K.L.; Le, H.A.; Nguyen, T.H.; Nguyen, D.T. Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam. Data 2022, 7, 160. https://doi.org/10.3390/ data7110160 Academic Editor: Francisco Guijarro Received: 13 October 2022 Accepted: 7 November 2022 Published: 14 November 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). data Article Explainable Machine Learning for Financial Distress Prediction: Evidence from Vietnam Kim Long Tran 1 , Hoang Anh Le 2, * , Thanh Hien Nguyen 3 and Duc Trung Nguyen 1 1 Faculty of Banking, Ho Chi Minh University of Banking, No. 36 Ton That Dam Street, Nguyen Thai Binh Ward, District 1, Ho Chi Minh City 700000, Vietnam 2 Institute for Research Science and Banking Technology, Ho Chi Minh University of Banking, No. 36 Ton That Dam Street, Nguyen Thai Binh Ward, District 1, Ho Chi Minh City 700000, Vietnam 3 Department of Economic Mathematics, Ho Chi Minh University of Banking, No. 36 Ton That Dam Street, Nguyen Thai Binh Ward, District 1, Ho Chi Minh City 700000, Vietnam * Correspondence: anhlh_vnc@buh.edu.vn Abstract: The past decade has witnessed the rapid development of machine learning applied in economics and finance. Recent evidence suggests that machine learning models have produced superior results to traditional statistical models and have become the driving force for dramatic improvement in the financial industry. However, a much-debated question is whether the prediction results from black box machine learning models can be interpreted. In this study, we compared the predictive power of machine learning algorithms and applied SHAP values to interpret the prediction results on the dataset of listed companies in Vietnam from 2010 to 2021. The results showed that the extreme gradient boosting and random forest models outperformed other models. In addition, based on Shapley values, we also found that long-term debts to equity, enterprise value to revenues, account payable to equity, and diluted EPS had greatly influenced the outputs. In terms of practical contributions, the study helps credit rating companies have a new method for predicting the possibility of default of bond issuers in the market. The study also provides an early warning tool for policymakers about the risks of public companies in order to develop measures to protect retail investors against the risk of bond default. Keywords: explainable AI; financial distress; machine learning 1. Introduction Financial distress refers to the situation in which a company fail to meet debt obliga- tions to its creditors at maturity. The prolonged and severe financial distress can eventually lead to bankruptcy. Traditionally, the assessment of the financial distress situation of com- panies was mainly based on the subjective judgment of experts. However, this expert-based approach exposes many drawbacks, including the results are inconsistent, cannot be vali- dated and are highly dependent on expert competence. Therefore, other approaches have been developed to improve consistency and accuracy.These classification techniques can be categorized into statistical methods and machine learning methods. Statistical methods include univariate analysis [1], multiple discriminant analysis [2], logistic regression [3], and Cox survival model [4]. Statistical models are simple in structure, highly explanatory, and take less time to train. However, statistical models require many strict assumptions unavailable in real life, including linear relationships, homogeneity of variances and inde- pendence assumptions. Violation of these assumptions can reduce the predictive power of statistical methods. Then, the development of machine learning algorithms marked a breakthrough in the science of prediction. The application of machine learning models, such as support vector machine [5], decision tree [6], and artificial neural networks [7], have enhanced the predictive power of traditional models. Recently, ensemble models such as random forest [8], adaptive boosting [9], and extreme gradient boosting [10] have become Data 2022, 7, 160. https://doi.org/10.3390/data7110160 https://www.mdpi.com/journal/data