AWERProcedia Information Technology & Computer Science Vol 04 (2013) 401-407 3 rd World Conference on Innovation and Computer Sciences 2013 Breast Cancer Diagnosis Based on Naïve Bayes Machine Learning Classifier with KNN Missing Data Imputation Ceren Güzel *, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570, Ankara, Turkey. Mahmut Kaya, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570, Ankara, Turkey. Oktay Yıldız, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570, Ankara, Turkey. Hasan Şakir Bilge, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570, Ankara, Turkey. Suggested Citation: Güzel C., Kaya M. & Yıldız O. HasaŶ Şakiƌ Bilge. Bƌeast CaŶĐeƌ DiagŶosis Based oŶ Naïǀe Bayes MaĐhiŶe LeaƌŶiŶg Classifier with KNN Missing Data Imputation. AWERProcedia Information Technology & Computer Science. [Online]. 2013, 04, pp 401-407. Available from: www.awer-center.org/pitcs Received December 25, 2012; revised January 18, 2013; accepted March 16, 2013. SeleĐtioŶ aŶd peeƌ ƌeǀieǁ uŶdeƌ ƌespoŶsiďility of Pƌof. Dƌ. FahƌettiŶ Sadıkoglu, Neaƌ East UŶiǀeƌsity. ©2013 Academic World Education & Research Center. All rights reserved. Abstract Cancer is one of the most mortal diseases in the world. Breast cancer is the second leading cause of death in women. Mammography is a method which is used to detect breast cancer in the initial stage. It helps physicians about in their decisions whether biopsy is necessary or not with respect to tissue shape, border and density. According to researches, 70% of biopsies have done without a need. Because of its cost and complications, it is essential to decide whether biopsy is necessary or not. To achieve this aim, lots of machine learning algorithms are developed to help medical diagnosis in literature, but data sets include some missing values in many real world tasks. These missing values adversely affect classifier performance. Our approach is to impute missing values with k Nearest Neighbor algorithm (kNN) aŶd Naïǀe Bayes. TheŶ, the performance of the system is evaluated by kNN aŶd Naïǀe Bayes Đlassifieƌs to deteĐt breast cancer. Our proposal is measured by performance criteria such as accuracy, sensitivity, specificity and ROC analysis. With this approach, 95 out of 131 missing data which is 9.89% of all data are filled. The experimental results on Mammographic Mass database demonstrate the effectiveness of our proposal with 82.49% accuracy while 81.69% accuracy is obtained without any imputation using same * ADDRESS FOR CORRESPONDENCE: Ceren Güzel, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570, Ankara, Turkey, E-mail Address: cerenguzel@gazi.edu.tr / Tel.: +90-312-582-3119