Citation: Ahamad, G.N.; Shafiullah;
Fatima, H.; Imdadullah; Zakariya,
S.M.; Abbas, M.; Alqahtani, M.S.;
Usman, M. Influence of Optimal
Hyperparameters on the
Performance of Machine Learning
Algorithms for Predicting Heart
Disease. Processes 2023, 11, 734.
https://doi.org/10.3390/pr11030734
Academic Editors: Mohammed
Mahbubul Islam and Md Azhar
Received: 23 January 2023
Revised: 23 February 2023
Accepted: 26 February 2023
Published: 1 March 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
processes
Article
Influence of Optimal Hyperparameters on the Performance of
Machine Learning Algorithms for Predicting Heart Disease
Ghulab Nabi Ahamad
1
, Shafiullah
2,
* , Hira Fatima
1
, Imdadullah
3,
*, S. M. Zakariya
3
, Mohamed Abbas
4
,
Mohammed S. Alqahtani
5,6
and Mohammed Usman
4
1
Institute of Applied Sciences, Mangalayatan University, Aligarh 202145, India
2
Department of Mathematics, K.C.T.C. College, Raxual, BRA, Bihar University, Muzaffarpur 842001, India
3
Electrical Engineering Section, University Polytechnic, Aligarh Muslim University, Aligarh 202002, India
4
Electrical Engineering Department, College of Engineering, King Khalid University, Abha 61421, Saudi Arabia
5
Radiological Sciences Department, College of Applied Medical Sciences, King Khalid University,
Abha 61421, Saudi Arabia
6
BioImaging Unit, Space Research Center, Michael Atiyah Building, Univesity of Leicester,
Leicester LE1 7RH, UK
* Correspondence: shafi.stats@gmail.com (S.); imdadullah@zhcet.ac.in (I.)
Abstract: One of the most difficult challenges in medicine is predicting heart disease at an early stage.
In this study, six machine learning (ML) algorithms, viz., logistic regression, K-nearest neighbor,
support vector machine, decision tree, random forest classifier, and extreme gradient boosting,
were used to analyze two heart disease datasets. One dataset was UCI Kaggle Cleveland and the
other was the comprehensive UCI Kaggle Cleveland, Hungary, Switzerland, and Long Beach V.
The performance results of the machine learning techniques were obtained. The support vector
machine with tuned hyperparameters achieved the highest testing accuracy of 87.91% for dataset-I
and the extreme gradient boosting classifier with tuned hyperparameters achieved the highest testing
accuracy of 99.03% for the comprehensive dataset-II. The novelty of this work was the use of grid
search cross-validation to enhance the performance in the form of training and testing. The ideal
parameters for predicting heart disease were identified through experimental results. Comparative
studies were also carried out with the existing studies focusing on the prediction of heart disease,
where the approach used in this work significantly outperformed their results.
Keywords: heart disease prediction; UCI Kaggle dataset; machine learning algorithms; GridSearchCV;
hyperparameters
1. Introduction
The most important factor in blood flow through veins is the heart [1]. The blood that
circulates through our bodies and carries nutrients, oxygen, metals, and other essential
substances is the most important part of our circulatory system. The faulty functioning
of the heart can lead to serious health issues and even death [2]. Living an unhealthy
lifestyle, using tobacco, drinking alcohol, and eating a lot of fat can all lead to heart
disease [3,4]. The World Health Organization estimates that heart disease claims the lives
of roughly 10 million people per year. Only a healthy lifestyle and early detection can stop
circulatory system diseases [5,6]. Despite the fact that in recent years, cardiac issues have
been identified as the main cause of death worldwide, they are still conditions that can
be properly managed and controlled. How effectively an illness can be controlled overall
depends on the exact timing of its detection. The recommended strategy tries to recognize
certain cardiac abnormalities early in order to stop heart disease. Several researchers
are utilizing statistical and data mining techniques to help identify heart illness [7]. The
majority of the data in the medical database are discrete. Making decisions with these
datasets is therefore extremely challenging [8–10].
Processes 2023, 11, 734. https://doi.org/10.3390/pr11030734 https://www.mdpi.com/journal/processes