Novel swarm optimization for mining classification rules on thyroid gland data Wei-Chang Yeh Integration & Collaboration Laboratory, Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, P.O. Box 123, Broadway, NSW 2007, Australia Department of Industrial Engineering and Engineering Management, National Tsing Hua University, P.O. Box 24-60, Hsinchu 300, Taiwan, ROC article info Article history: Received 24 June 2010 Received in revised form 31 October 2011 Accepted 10 February 2012 Available online 18 February 2012 Keywords: Data mining Simplified swarm optimization (SSO) Classification rules Thyroid gland data Orthogonal array test (OAT) abstract This work uses a novel rule-based classifier design method, constructed by using improved simplified swarm optimization (SSO), to mine a thyroid gland dataset from UCI databases. An elite concept is added to the proposed method to improve solution quality, close inter- val encoding (CIE) is added to efficiently represent the rule structure, and the orthogonal array test (OAT) is added to powerfully prune rules to avoid over-fitting the training data- set. To evaluate the classification performance of the proposed improved SSO, computer simulations are performed on well-known thyroid gland data. Computational results com- pare favorably with those obtained using existing algorithms such as conventional classi- fiers, including Bayes classifier, k-NN, k-Means, and 2D-SOM, and soft computing based methods such as the simple SSO, immune-estimation of distribution algorithms (IEDA), and genetic algorithm (GA). Ó 2012 Elsevier Inc. All rights reserved. 1. Introduction Data mining is an efficient approach for analyzing and discovering knowledge from a large complex dataset of heteroge- neous quality, for which a variety of data mining tools have been developed. The rule-based classifier is one such important tool [1–7,12,13,17,20–22,26–28,30–40] which mines a small set of IF-THEN rules (e.g., IF condition THEN conclusion) from training data for classification with predicted classes, and then uses this rule set to predict new data instances [25]. The rule-based classifier has the advantage of generating high-level symbolic-knowledge representation, which increases the comprehensibility of discovered knowledge [1–7,12,13,17,20–22,26–28,30–40], and it has been extensively applied to many real-world problems in medicine, social sciences, management, and engineering [1–7,12,13,17,20–22,26–28,30–40]. Many conventional algorithms have been proposed for classification such as the Bayes classifier [18,19], k-NN, k-Means, and 2D-SOM (self-organizing map) [6,11,15,23]. Soft computing methods (SCs) have been utilized to find optimal or good quality solutions to complex optimization problems in a number of fields [4,5,8–10,13–22,24,30–40]. Consequently, many new data mining techniques have been based on SC, such as the genetic algorithm (GA, a biology-inspired SC) [5,14,31] and particle swarm optimization (PSO, a swarm-intelligence SC) [1,2,8,9,13,17,21,22,26,28,30,32–40]. Swarm-intelligence is an artificial intelligence, primarily inspired by the social behavior patterns of self-organized sys- tems, that considers the interactions among large groups of individuals [1,2,8,9,13,17,21,22,26,28,30,32–40]. The simplified swarm optimization (SSO) proposed by Yeh is a population-based stochastic optimization technique that belongs to the swarm-intelligence category [37–39] and is also an evolutionary computational method inspired by PSO [37]. Also known as discrete PSO (DPSO), SSO was originally proposed to overcome the drawbacks of PSO for discrete-type optimization 0020-0255/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2012.02.009 E-mail address: yeh@ieee.org Information Sciences 197 (2012) 65–76 Contents lists available at SciVerse ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins