An expert system for detection of breast cancer based on association rules and neural network Murat Karabatak a, * , M. Cevdet Ince b a Fırat University, Department of Electronics and Computer Science, 23119 Elazig, Turkey b Fırat University, Department of Electric-Electronics Engineering, 23119 Elazig, Turkey article info Keywords: Association rules Neural network Automatic detection Breast cancer abstract This paper presents an automatic diagnosis system for detecting breast cancer based on association rules (AR) and neural network (NN). In this study, AR is used for reducing the dimension of breast cancer data- base and NN is used for intelligent classification. The proposed AR + NN system performance is compared with NN model. The dimension of input feature space is reduced from nine to four by using AR. In test stage, 3-fold cross validation method was applied to the Wisconsin breast cancer database to evaluate the proposed system performances. The correct classification rate of proposed system is 95.6%. This research demonstrated that the AR can be used for reducing the dimension of feature space and proposed AR + NN model can be used to obtain fast automatic diagnostic systems for other diseases. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction Data classification process using knowledge obtained from known historical data has been one of the most intensively studied subjects in statistics, decision science and computer science. It has been applied in problems of medicine, social science management and engineering. Variable problems such as disease diagnosis, im- age recognition, and credit evaluation using classification tech- niques (Michie, Spiegelhalter, & Tayor, 1994). In medical and other domains, linear programming approaches were efficient and effective methods (Bennett & Mangasarian, 1992; Freed & Glover, 1981; Grinold, 1972; Smith, 1968). Recently, intelligent methods such as NN and support vector machines have been intensively used for classification tasks (Ryua, Chandrasekaranb, & Jacobc, 2007). One of the application areas of analyzing database and pattern rec- ognition is automated diagnostic systems. The aims of these studies are assisting to doctors in making diagnostic decision. Thanks to mod- ern facilities, very large databases can be collect in medicine. These databases need special techniques for analyzing, processing and effective use of them. Data mining and knowledge discovery in data- base are an approach to find relationships buried in data (Choua, Leeb, Shaoc, & Chenb, 2004). The methodologies consist of data visualiza- tion, machine learning and statistical techniques and these can be summarized as classification, prediction, clustering, etc. (Curt, 1995). Breast cancer is a very common and serious cancer for women. Mammography is one of the most used methods to detect the breast cancer (Choua et al., 2004). In literature, radiologists show considerable variation in interpreting a mammography (Elmore et al., 1994). Fine needle aspiration cytology (FNAC) is also widely adopted in the diagnosis of breast cancer. But, the average correct identification rate of FNAC is only 90%. So, it is necessary to develop better identification method to recognize the breast cancer. Statis- tical techniques and artificial intelligence techniques have been used to predict the breast cancer by several researchers (Kovaler- chuck, Triantaphyllou, Ruiz, & Clayton, 1997; Pendharkar, Rodger, Yaverbaum, Herman, & Benner, 1999). The objective of these iden- tification techniques is to assign patients to either a benign group that does not have breast cancer or a ‘malignant’ group who has strong evidence of having breast cancer. So, breast cancer diagnos- tic problems are more general and widely discussed classification problem. (Anderson, 1984; Dillon & Goldstein, 1984; Hand, 1981; Johnson & Wichern, 2002). There are many techniques to predict and classification breast cancer pattern. In Choua et al. (2004), artificial neural network and multivariate adaptive regression splines approach was used to classify the breast cancer pattern. In Aragonés, Ruiz, Jiménez, Pérez, and Conejo (2003), a combined neural network and decision trees model was used for prognosis of breast cancer relapse. In Ryua et al. (2007), isotonic separation technique was used to pre- dict breast cancer. In S ßahan, Polat, Kodaz, and Günes ß (2007),a new hybrid method based on fuzzy-artificial immune system and k-nn algorithm was proposed for breast cancer diagnosis. And in Übeyli (2007), Wisconsin breast cancer data was classified using multilayer perceptron neural network, combined neural network, probabilistic neural network, recurrent neural network and sup- port vector machine. 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.02.064 * Corresponding author. E-mail addresses: mkarabatak@firat.edu.tr (M. Karabatak), mcince@firat.edu.tr (M.C. Ince). Expert Systems with Applications 36 (2009) 3465–3469 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa