Citation: Ahamad, G.N.; Shafiullah; Fatima, H.; Imdadullah; Zakariya, S.M.; Abbas, M.; Alqahtani, M.S.; Usman, M. Influence of Optimal Hyperparameters on the Performance of Machine Learning Algorithms for Predicting Heart Disease. Processes 2023, 11, 734. https://doi.org/10.3390/pr11030734 Academic Editors: Mohammed Mahbubul Islam and Md Azhar Received: 23 January 2023 Revised: 23 February 2023 Accepted: 26 February 2023 Published: 1 March 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). processes Article Influence of Optimal Hyperparameters on the Performance of Machine Learning Algorithms for Predicting Heart Disease Ghulab Nabi Ahamad 1 , Shafiullah 2, * , Hira Fatima 1 , Imdadullah 3, *, S. M. Zakariya 3 , Mohamed Abbas 4 , Mohammed S. Alqahtani 5,6 and Mohammed Usman 4 1 Institute of Applied Sciences, Mangalayatan University, Aligarh 202145, India 2 Department of Mathematics, K.C.T.C. College, Raxual, BRA, Bihar University, Muzaffarpur 842001, India 3 Electrical Engineering Section, University Polytechnic, Aligarh Muslim University, Aligarh 202002, India 4 Electrical Engineering Department, College of Engineering, King Khalid University, Abha 61421, Saudi Arabia 5 Radiological Sciences Department, College of Applied Medical Sciences, King Khalid University, Abha 61421, Saudi Arabia 6 BioImaging Unit, Space Research Center, Michael Atiyah Building, Univesity of Leicester, Leicester LE1 7RH, UK * Correspondence: shafi.stats@gmail.com (S.); imdadullah@zhcet.ac.in (I.) Abstract: One of the most difficult challenges in medicine is predicting heart disease at an early stage. In this study, six machine learning (ML) algorithms, viz., logistic regression, K-nearest neighbor, support vector machine, decision tree, random forest classifier, and extreme gradient boosting, were used to analyze two heart disease datasets. One dataset was UCI Kaggle Cleveland and the other was the comprehensive UCI Kaggle Cleveland, Hungary, Switzerland, and Long Beach V. The performance results of the machine learning techniques were obtained. The support vector machine with tuned hyperparameters achieved the highest testing accuracy of 87.91% for dataset-I and the extreme gradient boosting classifier with tuned hyperparameters achieved the highest testing accuracy of 99.03% for the comprehensive dataset-II. The novelty of this work was the use of grid search cross-validation to enhance the performance in the form of training and testing. The ideal parameters for predicting heart disease were identified through experimental results. Comparative studies were also carried out with the existing studies focusing on the prediction of heart disease, where the approach used in this work significantly outperformed their results. Keywords: heart disease prediction; UCI Kaggle dataset; machine learning algorithms; GridSearchCV; hyperparameters 1. Introduction The most important factor in blood flow through veins is the heart [1]. The blood that circulates through our bodies and carries nutrients, oxygen, metals, and other essential substances is the most important part of our circulatory system. The faulty functioning of the heart can lead to serious health issues and even death [2]. Living an unhealthy lifestyle, using tobacco, drinking alcohol, and eating a lot of fat can all lead to heart disease [3,4]. The World Health Organization estimates that heart disease claims the lives of roughly 10 million people per year. Only a healthy lifestyle and early detection can stop circulatory system diseases [5,6]. Despite the fact that in recent years, cardiac issues have been identified as the main cause of death worldwide, they are still conditions that can be properly managed and controlled. How effectively an illness can be controlled overall depends on the exact timing of its detection. The recommended strategy tries to recognize certain cardiac abnormalities early in order to stop heart disease. Several researchers are utilizing statistical and data mining techniques to help identify heart illness [7]. The majority of the data in the medical database are discrete. Making decisions with these datasets is therefore extremely challenging [810]. Processes 2023, 11, 734. https://doi.org/10.3390/pr11030734 https://www.mdpi.com/journal/processes