International Research Journal of Engineering And Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal Page 1203
Diagnosis of Cancer using Fuzzy Rough Set Theory
L.Meenachi
1
, Dr.S.Ramakrishnan
2
, M.Arunithi
3
, R.Karthiga
4
, S.Karthika
5
, P.Nandhini
6
1
Assistant Professor, Department of Information Technology
Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India
2
Head/Professor, Department of Information Technology
Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India
3456
Students, Department of Information Technology
Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Cancer is one of the deadliest diseases.
Early diagnosis and treatment at early stage can
enhance the outcome of the patient. Our main objective
is to classify the different types of cancer data. Our
project involves four modules: feature selection,
instance selection, classification and performance
analysis. It identifies the appropriate set of features by
eliminating the irrelevant features to improve the
performance of the classifier. The Fuzzy Rough Subset
evaluation method is used in conjunction with a
Particle Swarm Optimization (PSO) for feature
selection. In the second module, the missing values are
removed in the data set using RemoveMissing filter.
Then the Instance Selection algorithm is used to identify
appropriate set of instances by eliminating useless and
erroneous instances. Next in the third phase, the Fuzzy-
Rough Nearest Neighbor algorithm is utilized to classify
the data set obtained from the above steps. Finally the
performance of the classifier is evaluated using
evaluation metrics.
Key Words: FuzzyRoughSubset Evaluator, Particle
Swarm Optimization, Fuzzy-Rough Nearest Neighbor.
1. INTRODUCTION
Cancer is a deadly disease, also called as malignant
tumor. It causes abnormal cell growth or alteration in the
cell genetic structure and they also have potential to
spread or invade other parts of the body. Some of the
symptoms include: a new lump, unexplained weight loss,
abnormal bleeding, a prolonged cough and a change
in bowel movements among others. Early diagnosis and
treatment of cancer can enhance the outcome of patients.
There are over 100 different known cancers that will
affect humans. Hence classification of cancer helps to
identify cancer at earlier stagewhich helps in determining
appropriate treatment and helps to determine the
prognosis. When cancer is identified in any of the patients
then they can start their treatment and therapy at the
earlier stage of cancer.
Initially patient’s records are collected and
transformed into data sets. And we have to identify the
appropriate set of features from which we can classify the
cancer. This process involves selecting minimum feature
set by eliminating irrelevant features through which we
can improve the classification accuracy. Instance selection
aims to reduce the number of instances in the data set by
either eliminating bad instances or extracting as much
instances as possible so that the noise in the original data
set can be reduced. It also removes instances that cause
conflicts with other instances.
A test pattern related to each type of cancer is
developed. The selected instances are matched with the
test pattern and classified based on the matches found.
The technique used for the above processes are particle
swarm optimization for feature selection, Fuzzy Rough
Instance Selection method along with weak gamma
evaluator as a measure for instance selection and fuzzy
rough nearest neighbor classifier for classification process.
The accuracy of a classifier for a given test set is
defined as the percentage of test set tuples that are
correctly classified by the classifier. The associated class
label related to each test tuple is compared with the
learned classifier’s class prediction for that tuple. If the
accuracy of the classifier is acceptable, the classifier can be
used to classify future data tuples for which the class label
is not known.After completion of the above processes
some metrics are calculated like kappa statistics,
sensitivity, specificity, f-measure and area under curve
only by then we can identify whether ourclassification
method is more efficient for classification of data.
2. RELATED WORK
There has been a lot of research on the diagnosis
of cancer and classification of data with the data set in the
literature with a relatively high classification performance.
Fuzzy rough set theory preserves the original
meaning of the features even after reduction. This may
help the applicationsthat involve datasets with huge
number of features, which would be impossible to process