32 | Page DATA IMBALANCE HANDLING TECHNIQUES IN DISEASE PREDICTION MODELS Deepak Yashwantrao Bhadane Research Scholar, P.K. University, Shivpuri, M .P. India, Pin- 473665, dybhadane@gmail.com Dr. Indrabhan Supdu Borse Research Guide, Department of Computer Science & Engineering P. K. University, Shivpuri, M .P. India, Pin- 473665, Madhya Pradesh, India indrabhan2000@gmail.com Abstract A class imbalance occurs when there is a significant difference among the two categories of the target variable, with numerous occurrences of one class and few instances of the other. This issue has grown more common in many domains that use models for forecasting, such as illness forecasting techniques, which use information mining and machine learning to solve issues in the healthcare industry. Since the method attends to acquire additional about the larger class owing to its a large sample and acknowledge fewer of the minority class in response to its reduced quantity, class disparity specifically triggers an algorithm powered by machine learning to incorrectly identify events from the minority group even though it can accomplish high precision. This happens since the method might just categorize every case as the a majority class in an information set in class disparity. Public confidence in the consequences of the choice based on the prejudiced result is also impacted by this problem, in addition to the algorithm's forecast outcomes. If this issue of class imbalance is not resolved, the predictive model may label the minority class samples incorrectly, which might undermine the validity of the model's findings. This article offers a thorough analysis of methods for fixing data imbalance and the difficulties associated with doing so. Keywords: Undersampling. Oversampling, SMOTE, Bagging, Boosting, Class Imbalance, Data Imbalance. 1. INTRODUCTION Since a business's or organization's capacity to grow and flourish mostly rests on how effectively it comprehends and utilizes the data it has gathered, data has become more vital in today's society. Every company or organization nowadays produces massive volumes of data across a range of areas, such as banking, business, finance, and healthcare. Medical data may be provided by hospitals, physicians, care