Research Article
Prediction of Cardiovascular Disease on Self-Augmented
Datasets of Heart Patients Using Multiple Machine
Learning Models
Sumaira Ahmed ,
1
Salahuddin Shaikh,
1
Farwa Ikram,
2
Muhammad Fayaz ,
3
Hathal Salamah Alwageed ,
4
Faheem Khan,
5
and Fawwad Hassan Jaskani
6
1
Centre of Computing Research, Department of Computer Science and Software Engineering, Jinnah University for Women,
Karachi 74600, Pakistan
2
Department of Computer Engineering, University of Lahore, Pakistan
3
Department of Computer Science, University of Central Asia, Naryn, Kyrgyzstan
4
College of Computer and Information Science, Jouf University, Saudi Arabia
5
Gachon University, Department of Computer Engineering, Republic of Korea
6
Department of Computer Systems Engineering, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
Correspondence should be addressed to Muhammad Fayaz; muhammad.fayaz@ucentralasia.org
Received 24 June 2022; Revised 14 October 2022; Accepted 27 October 2022; Published 23 December 2022
Academic Editor: Rajesh Kaluri
Copyright © 2022 Sumaira Ahmed et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
About 26 million people worldwide experience its effects each year. Both cardiologists and surgeons have a tough time
determining when heart failure will occur. Classification and prediction models applied to medical data allow for enhanced
insight. Improved heart failure projection is a major goal of the research team using the heart disease dataset. The probability
of heart failure is predicted using data mined from a medical database and processed by machine learning methods. It has
been shown, through the use of this study and a comparative analysis, that heart disease may be predicted with high precision.
In this study, researchers developed a machine learning model to improve the accuracy with which diseases like heart failure
(HF) may be predicted. To rank the accuracy of linear models, we find that logistic regression (82.76 percent), SVM (67.24
percent), KNN (60.34 percent), GNB (79.31 percent), and MNB (72.41) perform best. These models are all examples of
ensemble learning, with the most accurate being ET (70.31%), RF (87.03%), and GBC (86.21%). DT (ensemble learning
models) achieves the highest degree of precision. CatBoost outperforms LGBM, HGBC, and XGB, all of which achieve 84.48%
accuracy or better, while XGB achieves 84.48% accuracy using a gradient-based gradient method (GBG). LGBM has the
highest accuracy rate (86.21 percent) (hypertuned ensemble learning models). A statistical analysis of all available algorithms
found that CatBoost, random forests, and gradient boosting provided the most reliable results for predicting future heart attacks.
1. Introduction
Patients often undergo a battery of tests, putting them under
unnecessary mental, emotional, and financial strain.
Tobacco use, excessive body fat, and cardiovascular disease
have all been linked in studies [1]. Pain in the arms and chest
is the most common indicator. Cardiac surgeons can benefit
from a thorough examination of such a dataset for both
diagnostic and operational purposes [2]. It has been
attempted in the past [2] to enhance the HF diagnostic pro-
cess through the use of learning machines and heart disease
categories. This project aims at exploring different machine
learning techniques and making better use of healthcare
data. It is anticipated that classifier efficiency would rise.
Heart failure (HF) and other health risks are affected by an
individual’s unique set of circumstances. Standard HF risk
prediction models consider each variable as a covariate, but
this approach ignores important characteristics like cardiac
Hindawi
Journal of Sensors
Volume 2022, Article ID 3730303, 21 pages
https://doi.org/10.1155/2022/3730303