Principles component analysis, fuzzy weighting pre-processing and artificial immune recognition system based diagnostic system for diagnosis of lung cancer Kemal Polat * , Salih Gu ¨nes ß Selcuk University, Department of Electrical and Electronics Engineering, 42035 Konya, Turkey Abstract Lung cancers are cancers that begin in the lungs. Other types of cancers may spread to the lungs from other organs. However, these are not lung cancers because they did not start in the lungs. It is evident that usage of machine learning methods in disease diagnosis has been increasing gradually. In this study, diagnosis of lung cancer, which is a very common and important disease, was conducted with such a machine learning system. In this study, we have detected on lung cancer using principles component analysis (PCA), fuzzy weight- ing pre-processing and artificial immune recognition system (AIRS). The approach system has three stages. First, dimension of lung can- cer dataset that has 57 features is reduced to four features using principles component analysis. Second, a new weighting scheme based on fuzzy weighting pre-processing was utilized as a pre-processing step before the main classifier. Third, artificial immune recognition system was our used classifier. We took the lung cancer dataset used in our study from the UCI machine learning database. The obtained clas- sification accuracy of our system was 100% and it was very promising with regard to the other classification applications in literature for this problem. Ó 2006 Elsevier Ltd. All rights reserved. Keywords: Principles component analysis; Artificial immune system; AIRS; Fuzzy weighting pre-processing; Lung cancer; Medical diagnosis 1. Introduction Lung cancers are cancers that begin in the lungs. Other types of cancers may spread to the lungs from other organs. However, these are not lung cancers because they did not start in the lungs. When cancer cells spread from one organ to another, they are called metastases. Research has found several risk factors for lung cancer. A ‘‘risk fac- tor’’ is anything that changes risk of getting a disease. Dif- ferent risk factors change risk by different amounts. The risk factors for lung cancer include the following (http://www.cdc.gov/lungcancer/basic_info/index.htm): smoking and being around others’ smoke, things around us at home or work (such as radon gas), personal traits (such as having a family history of lung cancer). Having so many factors to analyze to diagnose the lung cancer of a patient makes the physician’s job difficult. A physician usually makes decisions by evaluating the cur- rent test results of a patient and by referring to the previous decisions she made on other patience with the same condi- tion. The former method depends strongly on the physi- cian’s knowledge. On the other hand, the latter depends on the physician’s experience to compare her patient with her earlier patients. This job is not easy considering the number of factors she has to evaluate. In this crucial step, she may need an accurate tool that lists her previous deci- sions on the patient having same (or close to same) factors. 0957-4174/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2006.09.001 * Corresponding author. Tel.: +90 332 223 2056; fax: +90 332 241 0635. E-mail addresses: kpolat@selcuk.edu.tr (K. Polat), sgunes@selcuk.edu. tr (S. Gu ¨nes ß). www.elsevier.com/locate/eswa Expert Systems with Applications 34 (2008) 214–221 Expert Systems with Applications