International Research Journal of Engineering And Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072 © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal Page 1203 Diagnosis of Cancer using Fuzzy Rough Set Theory L.Meenachi 1 , Dr.S.Ramakrishnan 2 , M.Arunithi 3 , R.Karthiga 4 , S.Karthika 5 , P.Nandhini 6 1 Assistant Professor, Department of Information Technology Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India 2 Head/Professor, Department of Information Technology Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India 3456 Students, Department of Information Technology Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Cancer is one of the deadliest diseases. Early diagnosis and treatment at early stage can enhance the outcome of the patient. Our main objective is to classify the different types of cancer data. Our project involves four modules: feature selection, instance selection, classification and performance analysis. It identifies the appropriate set of features by eliminating the irrelevant features to improve the performance of the classifier. The Fuzzy Rough Subset evaluation method is used in conjunction with a Particle Swarm Optimization (PSO) for feature selection. In the second module, the missing values are removed in the data set using RemoveMissing filter. Then the Instance Selection algorithm is used to identify appropriate set of instances by eliminating useless and erroneous instances. Next in the third phase, the Fuzzy- Rough Nearest Neighbor algorithm is utilized to classify the data set obtained from the above steps. Finally the performance of the classifier is evaluated using evaluation metrics. Key Words: FuzzyRoughSubset Evaluator, Particle Swarm Optimization, Fuzzy-Rough Nearest Neighbor. 1. INTRODUCTION Cancer is a deadly disease, also called as malignant tumor. It causes abnormal cell growth or alteration in the cell genetic structure and they also have potential to spread or invade other parts of the body. Some of the symptoms include: a new lump, unexplained weight loss, abnormal bleeding, a prolonged cough and a change in bowel movements among others. Early diagnosis and treatment of cancer can enhance the outcome of patients. There are over 100 different known cancers that will affect humans. Hence classification of cancer helps to identify cancer at earlier stagewhich helps in determining appropriate treatment and helps to determine the prognosis. When cancer is identified in any of the patients then they can start their treatment and therapy at the earlier stage of cancer. Initially patient’s records are collected and transformed into data sets. And we have to identify the appropriate set of features from which we can classify the cancer. This process involves selecting minimum feature set by eliminating irrelevant features through which we can improve the classification accuracy. Instance selection aims to reduce the number of instances in the data set by either eliminating bad instances or extracting as much instances as possible so that the noise in the original data set can be reduced. It also removes instances that cause conflicts with other instances. A test pattern related to each type of cancer is developed. The selected instances are matched with the test pattern and classified based on the matches found. The technique used for the above processes are particle swarm optimization for feature selection, Fuzzy Rough Instance Selection method along with weak gamma evaluator as a measure for instance selection and fuzzy rough nearest neighbor classifier for classification process. The accuracy of a classifier for a given test set is defined as the percentage of test set tuples that are correctly classified by the classifier. The associated class label related to each test tuple is compared with the learned classifier’s class prediction for that tuple. If the accuracy of the classifier is acceptable, the classifier can be used to classify future data tuples for which the class label is not known.After completion of the above processes some metrics are calculated like kappa statistics, sensitivity, specificity, f-measure and area under curve only by then we can identify whether ourclassification method is more efficient for classification of data. 2. RELATED WORK There has been a lot of research on the diagnosis of cancer and classification of data with the data set in the literature with a relatively high classification performance. Fuzzy rough set theory preserves the original meaning of the features even after reduction. This may help the applicationsthat involve datasets with huge number of features, which would be impossible to process