* Corresponding author: Rita Chhikara; Email: Copyright © 2023 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0. Boruta based feature selection model for heart disease prediction Yutika Agarwal 1 , Rita Chhikara 1, * and Sanjeev Rana 2 1 School of Engineering and technology, The Northcap University, India. 2 VISA Worldwide Inc, Singapore. International Journal of Science and Research Archive, 2023, 10(01), 768–774 Publication history: Received on 04 September 2023; revised on 10 October 2023; accepted on 13 October 2023 Article DOI: https://doi.org/10.30574/ijsra.2023.10.1.0830 Abstract In today’s time the rate of heart disease is increasing at a very fast pace and because of that it is becoming the reason for major cause of deaths worldwide. It is very important to give treatment for heart disease or predict any such disease beforehand but there are some medical centers where experts lack appropriate or fair expertise to diagnose and treat the patient on time. So often they assume their readings and as a result, poor outcome is shown which sometimes lead to death of the patient. This paper identifies the relevant attributes of heart diseases using Boruta, Lasso and Ridge feature selection method. It also presents valuable insight on effectiveness of various machine learning algorithms to predict heart disease. The feature selection method reduces number of features and at the same time maintaining comparable accuracy of the model. Experimental results demonstrate that Boruta feature selection with Random Forest classifier outperforms all the other state-of-art methods used in this study. Keywords: Boruta; Lasso; Ridge; Random Forest; XGBoost; Logistic Regression; Naïve Bayes 1. Introduction According to statistics from the World Health Organization (WHO), heart disease is the major cause of the mortality worldwide, resulting in around 17.9 deaths annually [1]. Heart attacks occur due to blockage in blood flow or an imbalance in certain health parameters. Individuals who have a high level of danger to get exposed to heart disease exhibit signs of elevated blood pressure, glucose and lipid levels as well as stress. The symptoms related to heart problems are somewhat similar or have same type of characteristics when compared with other illnesses and age- related factors may further complicate the diagnosis of healthcare professionals, leading to delays in treatment. The timely and accurate prediction of heart disease, combine with early detection plays a crucial role in improving patient survival rates [2]. When the heart and the blood vessels are affected, there is a possibility that it can lead to certain heart disease conditions. This includes how the fluid circulates in the body when it enters the bloodstream. The accurate diagnosis of such diseases is crucial and it is a difficult task which should be done efficiently and effectively. Medical experts play a vital role in making correct/accurate decisions which are essential for providing quality treatment to the patients [3]. Therefore, medical centers must provide training and guidance to healthcare professionals who may lack sufficient expertise in diagnosing these diseases. This training is necessary to ensure accuracy of all the important readings related to heart and other body parameters. The existing methods for the diagnosis and prediction of heart disease have certain limitations including the challenge of accurately predicting the diseases [4]. In order to address this issue, this paper aims to improve upon these constraints by utilizing the Boruta algorithm and machine learning algorithms to identify relevant features and enhance accuracy and predictability for heart disease. Boruta algorithm helps in identifying the most significant features from