Research Article Open Access Volume 4 • Issue 2 • 1000124 J Health Med Inform ISSN: 2157-7420 JHMI, an open access journal Health & Medical Informatics Keywords: Classiication; Decision tree; Machine learning; Support vector machine; 10-Fold cross-validation Introduction Breast cancer (BC) is the most common cancer in women, afecting about 10% of all women at some stages of their life. In recent years, the incidence rate keeps increasing and data show that the survival rate is 88% ater ive years from diagnosis and 80% ater 10 years from diagnosis [1]. Early prediction of breast cancer is one of the most crucial works in the follow-up process. Data mining methods can help to reduce the number of false positive and false negative decisions [2,3]. Consequently, new methods such as knowledge discovery in databases (KDD) has become a popular research tool for medical researchers who try to identify and exploit patterns and relationships among large number of variables, and predict the outcome of a disease using historical cases stored in datasets [4]. In this paper, using data mining techniques, authors developed models to predict the recurrence of breast cancer by analyzing data collected from ICBC registry. he next sections of this paper review related work, describe background of this study, evaluate three classiication models (C4.5 DT, SVM, and ANN), explain the methodology used to conduct the prediction, present experimental results, and the last part of the paper is the conclusion. To estimate validation of the models, accuracy, sensitivity, and speciicity were used as criteria, and were compared. Literature review and previous works models to predict 5, 10, and 15 -year breast cancer survival. hey studied 951 breast cancer patients and used tumor size, axillary nodal status, histological type, mitotic count, nuclear pleomorphism, tubule formation, tumor necrosis, and age as input variables [7]. Pendharker patterns in breast cancer. In this study, they showed that data mining could be a valuable tool in identifying similarities (patterns) in breast cancer cases, which can be used for diagnosis, prognosis, and treatment purposes [4]. hese studies are some examples of researches that apply data mining to medical ields for prediction of diseases. Materials and Methods In order to predict the 2-year recurrence rate of breast cancer, we used ICBC dataset in the National Cancer Institute of Tehran for the years 1997-2008. he ICBC is responsible for collecting incidence and survival data from the participating registries, and disseminating these datasets for the purpose of conducting analytical research projects. his dataset contained population characteristics and included 22 input variables. Our cases were collected from the total number of 1189 women that were diagnosed breast cancer. We preprocessed the data to remove unsuitable cases. Ater using data cleansing and data preparation strategies, the inal dataset was constructed. Finally, 547 cases were analyzed ater 642 records were excluded because of missing data. Patients with breast cancer recurrence were followed-up *Corresponding author: Leila Ghasem Ahmad, Department of Management Information Systems, Science and Research Branch, Islamic Azad University of Tehran-Iran, Iran, E-mail: lga_77@yahoo.com Received January 28, 2013; Accepted April 18, 2013; Published April 24, 2013 Abstract Objective: The number and size of medical databases are increasing rapidly but most of these data are not ana- lyzed for inding the valuable and hidden knowledge. Advanced data mining techniques can be used to discover hidden patterns and relationships. Models developed from these techniques are useful for medical practitioners to make right decisions. The present research studied the application of data mining techniques to develop predictive models for breast cancer recurrence in patients who were followed-up for two years. Method: The patients were registered in the Iranian Center for Breast Cancer (ICBC) program from 1997 to 2008. The dataset contained 1189 records, 22 predictor variables, and one outcome variable. We implemented machine learning techniques, i.e., Decision Tree (C4.5), Support Vector Machine (SVM), and Artiicial Neural Network (ANN) to develop the predictive models. The main goal of this paper is to compare the performance of these three well-known algorithms on our data through sensitivity, speciicity, and accuracy. Results and Conclusion: Our analysis shows that accuracy of DT, ANN and SVM are 0.936, 0.947 and 0.957 respectively. The SVM classiication model predicts breast cancer recurrence with least error rate and highest accuracy. The predicted accuracy of the DT model is the lowest of all. The results are achieved using 10-fold cross-validation for measuring the unbiased prediction accuracy of each model. Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence Ahmad et al., J Health Med Inform 2013, 4:2 http://dx.doi.org/10.4172/2157-7420.1000124 Ahmad LG*, Eshlaghy AT, Poorebrahimi A, Ebrahimi M and Razavi AR Department of Management Information Systems, Science and Research Branch, Islamic Azad University of Tehran-Iran, Iran Citation: Ahmad LG, Eshlaghy AT, Poorebrahimi A, Ebrahimi M, Razavi AR (2013) Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence. J Health Med Inform 4: 124. doi:10.4172/2157-7420.1000124 Copyright: © 2013 Ahmad LG, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. A literature review showed that there have been several studies on the survival prediction problem using statistical approaches and artiicial neural networks. However, we could only ind a few studies related to medical diagnosis and recurrence using data mining approaches such as decision trees [5,6]. Delen et al. used artiicial neural networks, decision trees and logistic regression to develop prediction models for breast cancer survival by analyzing a large dataset, the SEER cancer incidence database [6]. Lundin et al. used ANN and logistic regression