AWERProcedia
Information Technology
& Computer Science
Vol 04 (2013) 401-407
3
rd
World Conference on Innovation and Computer Sciences 2013
Breast Cancer Diagnosis Based on Naïve Bayes Machine Learning
Classifier with KNN Missing Data Imputation
Ceren Güzel *, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570,
Ankara, Turkey.
Mahmut Kaya, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570,
Ankara, Turkey.
Oktay Yıldız, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570,
Ankara, Turkey.
Hasan Şakir Bilge, Gazi University, Faculty of Engineering, Computer Engineering Department, 06570,
Ankara, Turkey.
Suggested Citation:
Güzel C., Kaya M. & Yıldız O. HasaŶ Şakiƌ Bilge. Bƌeast CaŶĐeƌ DiagŶosis Based oŶ Naïǀe Bayes MaĐhiŶe LeaƌŶiŶg
Classifier with KNN Missing Data Imputation. AWERProcedia Information Technology & Computer Science.
[Online]. 2013, 04, pp 401-407. Available from: www.awer-center.org/pitcs
Received December 25, 2012; revised January 18, 2013; accepted March 16, 2013.
SeleĐtioŶ aŶd peeƌ ƌeǀieǁ uŶdeƌ ƌespoŶsiďility of Pƌof. Dƌ. FahƌettiŶ Sadıkoglu, Neaƌ East UŶiǀeƌsity.
©2013 Academic World Education & Research Center. All rights reserved.
Abstract
Cancer is one of the most mortal diseases in the world. Breast cancer is the second leading cause of death in women.
Mammography is a method which is used to detect breast cancer in the initial stage. It helps physicians about in their
decisions whether biopsy is necessary or not with respect to tissue shape, border and density. According to researches, 70%
of biopsies have done without a need. Because of its cost and complications, it is essential to decide whether biopsy is
necessary or not. To achieve this aim, lots of machine learning algorithms are developed to help medical diagnosis in
literature, but data sets include some missing values in many real world tasks. These missing values adversely affect classifier
performance. Our approach is to impute missing values with k Nearest Neighbor algorithm (kNN) aŶd Naïǀe Bayes. TheŶ, the
performance of the system is evaluated by kNN aŶd Naïǀe Bayes Đlassifieƌs to deteĐt breast cancer. Our proposal is measured
by performance criteria such as accuracy, sensitivity, specificity and ROC analysis. With this approach, 95 out of 131 missing
data which is 9.89% of all data are filled. The experimental results on Mammographic Mass database demonstrate the
effectiveness of our proposal with 82.49% accuracy while 81.69% accuracy is obtained without any imputation using same
* ADDRESS FOR CORRESPONDENCE: Ceren Güzel, Gazi University, Faculty of Engineering, Computer Engineering Department,
06570, Ankara, Turkey, E-mail Address: cerenguzel@gazi.edu.tr / Tel.: +90-312-582-3119