Periodicals of Engineering and Natural Sciences ISSN 2303-4521 Vol. 9, No. 2, June 2021, pp.1030-1037 1030 Imbalanced data classification using support vector machine based on simulated annealing for enhancing penalty parameter Hussein Ibrahim Hussein 1 , Said Amirul Anwar 2 1,2 Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Kampus Alam UniMAP, Pauh Putra, 02600 Arau, Perlis, Malaysia ABSTRACT For pattern cataloguing and regression issues, the support vector machine (SVM) is an eminent and computationally prevailing machine learning method. It’s been effectively addressing several concrete issues across an extensive gamut of domains. SVM possesses a key aspect called penalty factor C. The choice of these aspects has a substantial impact on the classification precision of SVM as unsuitable parameter settings might drive substandard classification outcomes. Penalty factor C is required to achieve an adequate trade-off between classification errors and generalisation performance. Hence, formulating an SVM model having appropriate performance requires parameter optimisation. The simulated annealing (SA) algorithm is employed to formulate a hybrid method for evaluating SVM parameters. Additionally, the intent is to enhance system efficacy to obtain the optimal penalty parameter and balance classification performance at the same time. Our experiments with many UCI datasets indicate that the recommended technique could attain enhanced classification precision. Keywords: Imbalanced data, support vector machine, penalty parameter, simulated Annealing. Corresponding Author: Hussein Ibrahim Hussein Faculty of Electronic Engineering Technology Universiti Malaysia Perlis, Malaysia Husseinsarhan45@yahoo.com 1. Introduction Structural risk minimization (SRM) algorithm based, a new classification approach was presented by Cortes and Vapnik [1], generally known as Support vector machine (SVM). The algorithm was quickly deployed for many cataloguing jobs because of its efficiency in identifying handwritten characters, wherein it outclassed accurately trained neural networks. SVMs carried out effective classification in other uses like time series estimation, bioinformatics and pattern classification. Burges (1998) issued an inclusive tutorial about the SVM classifier algorithm. SVM can process massive feature spaces since SVM training is performed so that classified vector dimensions do not exert undue influence on SVM performance compared to the influence of a typical classifier. Therefore, it is said to be particularly effective in bigger classification problems. Furthermore, SVM- based classifier is said to have sound generalization attributes as against traditional classifiers, because SVM classifier training requires a systematic reduction in misclassification risk. On the other hand, conventional classifiers are typically trained to minimise empirical risk [2]. Numerous techniques have been proposed to address this SVM issue. Huang and Wang [3] recommended a genetic algorithm (GA) method for constraint refinement[4]. Ren and Bai [5] too offered twin methodologies for constraint refinement in SVM: particle swarm optimization (PSO) SVM and GA-SVM. A classifier using the hybrid ant colony optimisation (ACO) technique concurrently identifies the best possible feature subset and works on SVM parameter optimisation; this technique was formulated by Huang [6]. Also, Lin et al. [7] worked on parameter value computation and SVM feature selection using the Simulated Annealing concept. The simulated annealing algorithm optimises an SVM by addressing the issue of the system being stuck at local optima. It works by facilitating non-optimal steps to be selected based on probability values. The technique was outlined separately by Kirkpatrick et al. [8]. Simulated annealing chooses an explanation in each repetition via examining whether neighbour explanation is preferable to present scenario. If yes, the novel explanation would be acknowledged unreservedly. Nevertheless,