An expert system for the prediction of stroke disease by different least squares support vector machines models. Mehmet Ediz Sarihan 1* , Davut Hanbay 2 1 Department of Emergency Medicine, Faculty of Medicine, Inonu University, Malatya, Turkey 2 Department of Computer Engineering, Faculty of Engineering, Inonu University, Malatya, Turkey Abstract Objective: One of the important life-threatening ailment is stroke across the world. The current paper was performed to classify the outcome of stroke by using Least-Squares Support Vector Machines (LS- SVMs) models. Materials and methods: The medical dataset related to stroke disease was achieved from the clinical database of the emergency medicine department. 28 predictors were recorded in raw dataset. For dimension reduction, correlations between input and target (stroke) variables were evaluated. Different LS-SVMs models were performed with radial basis function (RBF), linear and polynomial kernels. 5- fold cross-validation was used in composing stages to achieve the best model using all of the data. The accuracy and the Area under Receiver Operating Curve (AUC ROC) values were used for performance assessment. Results: At first, feature selection stage was performed. 14 input variables were determined after this stage. Whole dataset was partitioned into 5 sub-datasets (D 1 , D 2 , D 3 , D 4 , D 5 ) to use all data both training and testing. LS-SVMs models performance were evaluated by using 5-fold cross validation method. Accuracy and AUC values of the models were used as performance criteria. The best model performance was evaluated with LS-SVMs model using linear kernel. That model average accuracy was 86.6%. The best accuracy was evaluated with LS-SVM model using linear kernel on dataset D 5 was 94%. As a consequence, the LS-SVMs model can be used for predicting the outcome of stroke. Conclusion: The results point out that LS-SVMs with linear kernel have much more accuracy and AUC values for predicting stroke disease. The suggested LS-SVMs with linear kernel may produce beneficial prediction results related to stroke disease. In future studies, several data mining techniques may be tested and assembled for better classification performance of stroke disease. Keywords: Data mining, Stroke disease, Least square support vector machines (LS-SVMs). Accepted on September 26, 2017 Introduction Stroke is the significant reason of vascular behaviour and mentality disorderliness over the worldwide. In thriving countries, a shortage of information on the public health problem of stroke is present [1]. Stroke is an expanding illness and is an important reason of death worldwide following coronary heart disease and cancer ailment. Stroke frequently is the result of enhanced morbidity/mortality and lessened quality of life [2,3]. Data mining is a process of pattern discovery from a potentially large amount of data and is a multi-disciplinary topic that is conceived on the basis of logics in database systems. Examples of data mining techniques are Decision Trees, A priori Algorithm, Artificial Neural Networks (ANNs), Support Vector Machines (SVMs) and so on. Data mining can also be used in information technology evolving and subsequently branching off into sub-processes that include collecting data, creating database and management, analyzing data and finally interpreting data [4]. SVMs are one of the supervised machine learning (ML) approaches [5]. Since, then it is used widely in pattern recognition for regression and classification problems [6,7]. The LS-SVMs perform a classification by establishing a complex hyperplane optimally discriminating between two categories [6,7]. Kernel functions such as radial basis function (RBF), linear and polynomial are very powerful in mapping data into a larger dimensional domain and assist LS-SVMs to excellently separate data with very complex boundaries [8]. The LS version of the SVMs was described by Suykens and Vandewalle [9]. LS-SVMs are widely used in complex system studies [10]. In relation to the estimation of stroke, a study proposed SVMs in order to classify stroke thrombolysis, and the SVMs model yielded area under curve (AUC) of 0.744. The work showed that SVMs produced larger accuracy value than conventional, ISSN 0970-938X www.biomedres.info Biomedical Research 2017; 28 (20): 8760-8764 Biomed Res 2017 Volume 28 Issue 20 8760