DOI: http://dx.doi.org/10.26483/ijarcs.v9i1.5220
Volume 9, No. 1, January-February 2018
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
© 2015-19, IJARCS All Rights Reserved 448
ISSN No. 0976‐5697
AN EFFICIENT CLASSIFICATIONS MODEL FOR BREAST CANCER
PREDICTION BASED ON DIMENSIONALITY REDUCTION TECHNIQUES
B. Tamilvanan
Research and Development Centre
Bharathiar University,
Coimbatore-641046, TN, India.
Dr.V. Murali Bhaskaran
Principal
Dhirajlal Gandhi College of Technology
Salem- 636290, TN, India.
Abstract: Classification algorithms are efficiently utilized in the area of general medical diagnosis applications in order to identify the disorders
in advance. One such disease, breast cancer is the most prevalent and earnest quandary with women in most of the developing countries. Many
attempts are made in order to identify this problem with the objective of high precision and better accuracy. In this paper, an attempt is made
with the most popular and efficient classification algorithms namely Naive Bayes, Multilayer Perceptron, Radial basis function network, nearest
neighbour, Conjunctive rule to amend the efficiency of the detection, accuracy for the breast cancer dataset. As an objective of improving
accuracy, an efficient dimensionality reduction technique is incorporated in this work. The performances of these approaches are evaluated using
the metrics such as the precision, recall, f-measure, roc, Balanced Classification Rate (BCR), Matthews Correlation Coefficient (MCC) and
accuracy. From these measures it is clearly observed that Naive Bayes algorithm is able to achieve high accuracy rate along with minimum error
rate when compared to other algorithms. The review can be stretched out to draw the execution of other characterization systems on an extended
information set with more particular ascribes to get more exact outcomes.
Keywords: Classification, Naive Bayes, Multilayer Perceptron, Radial Basis Function network, Nearest Neighbour, Conjunctive Rule.
INTRODUCTION
Data mining strategies and software are utilized in a large
vary of fields, together with banking, gregarious science,
inculcation, enterprise industries, bioinformatics, weather,
forecasting healthcare and sizably voluminous data [1] [2].
Nowadays fitness care industry generates a massive amount
of information about patients, ailment diagnosis, etc. Some
exceptional types of processes to constructing correct
classifications have been proposed (e.g., NB, MLP, RBFnet,
NN, CJ). In classification, we supply a Breast Cancer data
set of example document or the input data, called the check
data set, with every document consisting of various
attributes.
An attribute can be both a numerical attribute or categorical
attribute. If values of an attributes belong to an
authoritatively mandated domain, the attribute is referred to
as numerical attribute ( e.g. Tumor-size, Deg-Malig,
Menopause, Age, Inv-nodes). A categorical attribute (e.g.
Irradiant, Breast, Node-cape, Breast-Quad, Class).
Classification is the process of splitting a dataset into
mutually exclusive groups, called a class, based on suitable
attributes.
In this world, distinctive sorts of Breast Cancer maladies are
a typical type of disease influencing all ladies of various
ages. Bosom disease influences the bosom tissue and
lobules. The classification of breast cancer is resulted from
its origination, if breast cancer is originated from milk ducts
then it is known as ductal carcinoma while cancer cells
found in lobules makes cancer termed as “lobular
carcinoma.” The screening of bosom malignancy is an
essential stride which sift through the manifestations that
can be utilized to analyze the patient's real obsessive
condition. Breast cancer is the most continuous reason for
death in more established ladies however in the meantime, it
is critical to note that more youthful ladies who don't go
under tumor screening process stay in risk hover of breast
cancer.
In this paper is planned accordingly: the relates works and
demonstration of the focused parts of the utilized data
mining methods in part 1. The details of the dataset for
Breast Cancer in part 2. The experimentation outcome and
conversation in part 3. And lastly, conclude the paper and
future enhancements.
LITERATURE REVIEW
A multinomial logistic-regression model with a hill-like
estimator generalizes logistic regression by using more than
two distinct outcomes between the categorical and
multinomial distributions [3].This model is mainly designed
to predict the probabilities of different outcomes when using
categorically dependent and independent variables.
An RBF network is an ANN that uses the K-means
clustering algorithm to implement the activation functions
and can study both discrete class and numeric class problem.
The RBF network generally includes three layers: input,
hidden, and output [4].
Nearest Neighbor classification is predominantly used when
all attribute values are unbroken, although it can be suitably
modified to deal with categorical attributes. The thought is
to assess the arrangement of a shrouded case utilizing the
characterization of the occurrence or cases that are nearest to
it, in some sense that we need to define [5].
The conjunctive rule is based on rule mining algorithm to
anticipate numeric and categorical class value. This