David C. Wyld et al. (Eds): ACSTY, AIBD, MLSC, CCCIOT, NATP - 2021 pp. 33-44, 2021. CS & IT - CSCP 2021 DOI: 10.5121/csit.2021.110304 TOWARDS COMPARING MACHINE LEARNING MODELS TO FORESEE THE STAGES FOR HEART DISEASE Khalid Amen 1 , Mohamed Zohdy 1 , and Mohammed Mahmoud 2 1 Department of Electrical and Computer Engineering, Oakland University, Rochester, MI, USA 2 Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA ABSTRACT With the increase in heart disease rates at advanced ages, we need to put a high quality algorithm in place to be able to predict the presence of heart disease at an early stage and thus, prevent it. Previous Machine Learning approaches were used to predict whether patients have heart disease. The purpose of this work is to compare two more algorithms (NB, KNN) to our previous work [1] to predict the five stages of heart disease starting from no disease, stage 1, stage 2, stage 3 and advanced condition, or severe heart disease. We found that the LR algorithm performs better compared to the other two algorithms. The experiment results show that LR performs the best with an accuracy of 82%, followed by NB with an accuracy of 79% when all three classifiers are compared and evaluated for performance based on accuracy, precision, recall and F measure. KEYWORDS Machine Learning (ML), Logistic Regression (LR), Naïve Bayes (NB), K-Nearest Neighbors (KNN). 1. INTRODUCTION 1.1. Machine Learning Machine Learning (ML) is a branch of artificial intelligence (AI) that is increasingly utilized within the field of heart disease medicine. It is essentially how computers make sense of data and decide, or classify, a task with or without human supervision. The conceptual framework of ML is based on models that receive input data (e.g., images or text) and through a combination of mathematical optimization and statistical analysis predict outcomes (e.g., favorable, unfavorable, or neutral) [2]. We have used five ML algorithms in our previous work to predict multiple stage heart disease. The first one is SVM, it can recognize non-linear patterns for use in facial recognition, handwriting interpretation or detection of fraudulent credit card transactions. So- called boosting algorithms used for prediction and classification have been applied to the identification and processing of spam email. The second algorithm is Random Forest (RF), it can facilitate decisions by averaging several nodes. The third algorithm is Gradient Tree Boosting (GTB), which is a ML technique for regression and classification problem that produces a prediction model in the form of an ensemble of weak prediction models. The fourth algorithm is Extra Random Forest (ERF), it is an