*
Corresponding author: Rita Chhikara; Email:
Copyright © 2023 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0.
Boruta based feature selection model for heart disease prediction
Yutika Agarwal
1
, Rita Chhikara
1, *
and Sanjeev Rana
2
1
School of Engineering and technology, The Northcap University, India.
2
VISA Worldwide Inc, Singapore.
International Journal of Science and Research Archive, 2023, 10(01), 768–774
Publication history: Received on 04 September 2023; revised on 10 October 2023; accepted on 13 October 2023
Article DOI: https://doi.org/10.30574/ijsra.2023.10.1.0830
Abstract
In today’s time the rate of heart disease is increasing at a very fast pace and because of that it is becoming the reason
for major cause of deaths worldwide. It is very important to give treatment for heart disease or predict any such disease
beforehand but there are some medical centers where experts lack appropriate or fair expertise to diagnose and treat
the patient on time. So often they assume their readings and as a result, poor outcome is shown which sometimes lead
to death of the patient. This paper identifies the relevant attributes of heart diseases using Boruta, Lasso and Ridge
feature selection method. It also presents valuable insight on effectiveness of various machine learning algorithms to
predict heart disease. The feature selection method reduces number of features and at the same time maintaining
comparable accuracy of the model. Experimental results demonstrate that Boruta feature selection with Random Forest
classifier outperforms all the other state-of-art methods used in this study.
Keywords: Boruta; Lasso; Ridge; Random Forest; XGBoost; Logistic Regression; Naïve Bayes
1. Introduction
According to statistics from the World Health Organization (WHO), heart disease is the major cause of the mortality
worldwide, resulting in around 17.9 deaths annually [1]. Heart attacks occur due to blockage in blood flow or an
imbalance in certain health parameters. Individuals who have a high level of danger to get exposed to heart disease
exhibit signs of elevated blood pressure, glucose and lipid levels as well as stress. The symptoms related to heart
problems are somewhat similar or have same type of characteristics when compared with other illnesses and age-
related factors may further complicate the diagnosis of healthcare professionals, leading to delays in treatment. The
timely and accurate prediction of heart disease, combine with early detection plays a crucial role in improving patient
survival rates [2].
When the heart and the blood vessels are affected, there is a possibility that it can lead to certain heart disease
conditions. This includes how the fluid circulates in the body when it enters the bloodstream. The accurate diagnosis of
such diseases is crucial and it is a difficult task which should be done efficiently and effectively. Medical experts play a
vital role in making correct/accurate decisions which are essential for providing quality treatment to the patients [3].
Therefore, medical centers must provide training and guidance to healthcare professionals who may lack sufficient
expertise in diagnosing these diseases. This training is necessary to ensure accuracy of all the important readings related
to heart and other body parameters.
The existing methods for the diagnosis and prediction of heart disease have certain limitations including the challenge
of accurately predicting the diseases [4]. In order to address this issue, this paper aims to improve upon these
constraints by utilizing the Boruta algorithm and machine learning algorithms to identify relevant features and enhance
accuracy and predictability for heart disease. Boruta algorithm helps in identifying the most significant features from