Computer Science & Engineering: An International Journal (CSEIJ), Vol 12, No 1, February 2022 DOI:10.5121/cseij.2022.12111 101 SIMPLEX BASED SOCIAL SPIDER OPTIMIZATION METHOD FOR IMPROVING MEDICAL DATA ANALYSIS Monalisa Nayak 1 , Soumya Das 2 , Urmila Bhanja 3 and Manas Ranjan Senapati 4 1 Indira Gandhi Institute of Technology, Sarang, Dhenkanal, India 2 Government College of Engineering, Kalahandi, India 3 Indira Gandhi Institute of Technology, Sarang, Dhenkanal, India 4 Veer Surendra Sai University of Technology, Burla, Sambalpur, India ABSTRACT Accurate and reliable prediction is the only way to prevent the disease transmission. Many machine learning models have been developed for prediction of large scale medical datasets. In this paper, Simplex based Social Spider Optimization method is used for classification of three types of medical datasets like heart disease, echocardiogram and hepatitis. The performance of the model is obtained by using Root Mean Square Error (RMSE) and time. KEYWORDS Social Spider Optimization (SSO), Simplex based Social Spider Optimization (SMSSO), Root Mean Square Error (RMSE), Support Vector Machine (SVM) & Random Forest Model (RFM). 1. INTRODUCTION Public Health Organisation are working on the decreasing or preventing the transmission of disease which is possible only if the prediction of the disease is done in an accurate and reliable manner. Various models are developed for this work. Model performance varies for different datasets. Machine learning is an effective method to take decisions and prediction of a large amount of medical data provided from health care department. Autoregressive integrated moving average (ARIMA), support vector machine (SVM) and long short-term memory (LSTM) recurrent neural network were developed to predict the Hepatitis E which is a severe liver disease. These three methods are compared on the basis of RMSE, MAPE and MAE. Results obtained are 0.022, 0.0204 and 0.01 for ARIMA, SVM and LSTM respectively. ARIMA is solved using python, SVM is solved by MATLAB and LSTM by keras [1]. A model was developed using four algorithms like extreme gradient boosting (XGBoost), random forest (RF), decision tree (DCT) and logistic regression (LR). The optimal model is attained using the area under the receiver operating characteristics curve (AUC). The AUC value are 0.891, 0.829, 0.619 and 0.680 for XGBoost, RF, DCT and LR respectively which show that XGBoost is the optimal machine learning model for prediction of Hepatitis B surface antigen (HBsAg) [2]. A novel method is developed that finds important features with application of machine learning methods that results in increasing the accuracy of the prediction of cardiovascular disease. The results show an accuracy of 88.7% with help of hybrid random forest with a linear model (HRFLM) which is a prediction model [3]. Random forest model is a machine learning model that was developed to predict accurately the survival of the patient after echocardiography. This model uses clinical datasets and results in AUC>0.82. Here, there is comparison of non-linear models