DOI: http://dx.doi.org/10.26483/ijarcs.v9i1.5220 Volume 9, No. 1, January-February 2018 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info © 2015-19, IJARCS All Rights Reserved 448 ISSN No. 0976‐5697 AN EFFICIENT CLASSIFICATIONS MODEL FOR BREAST CANCER PREDICTION BASED ON DIMENSIONALITY REDUCTION TECHNIQUES B. Tamilvanan Research and Development Centre Bharathiar University, Coimbatore-641046, TN, India. Dr.V. Murali Bhaskaran Principal Dhirajlal Gandhi College of Technology Salem- 636290, TN, India. Abstract: Classification algorithms are efficiently utilized in the area of general medical diagnosis applications in order to identify the disorders in advance. One such disease, breast cancer is the most prevalent and earnest quandary with women in most of the developing countries. Many attempts are made in order to identify this problem with the objective of high precision and better accuracy. In this paper, an attempt is made with the most popular and efficient classification algorithms namely Naive Bayes, Multilayer Perceptron, Radial basis function network, nearest neighbour, Conjunctive rule to amend the efficiency of the detection, accuracy for the breast cancer dataset. As an objective of improving accuracy, an efficient dimensionality reduction technique is incorporated in this work. The performances of these approaches are evaluated using the metrics such as the precision, recall, f-measure, roc, Balanced Classification Rate (BCR), Matthews Correlation Coefficient (MCC) and accuracy. From these measures it is clearly observed that Naive Bayes algorithm is able to achieve high accuracy rate along with minimum error rate when compared to other algorithms. The review can be stretched out to draw the execution of other characterization systems on an extended information set with more particular ascribes to get more exact outcomes. Keywords: Classification, Naive Bayes, Multilayer Perceptron, Radial Basis Function network, Nearest Neighbour, Conjunctive Rule. INTRODUCTION Data mining strategies and software are utilized in a large vary of fields, together with banking, gregarious science, inculcation, enterprise industries, bioinformatics, weather, forecasting healthcare and sizably voluminous data [1] [2]. Nowadays fitness care industry generates a massive amount of information about patients, ailment diagnosis, etc. Some exceptional types of processes to constructing correct classifications have been proposed (e.g., NB, MLP, RBFnet, NN, CJ). In classification, we supply a Breast Cancer data set of example document or the input data, called the check data set, with every document consisting of various attributes. An attribute can be both a numerical attribute or categorical attribute. If values of an attributes belong to an authoritatively mandated domain, the attribute is referred to as numerical attribute ( e.g. Tumor-size, Deg-Malig, Menopause, Age, Inv-nodes). A categorical attribute (e.g. Irradiant, Breast, Node-cape, Breast-Quad, Class). Classification is the process of splitting a dataset into mutually exclusive groups, called a class, based on suitable attributes. In this world, distinctive sorts of Breast Cancer maladies are a typical type of disease influencing all ladies of various ages. Bosom disease influences the bosom tissue and lobules. The classification of breast cancer is resulted from its origination, if breast cancer is originated from milk ducts then it is known as ductal carcinoma while cancer cells found in lobules makes cancer termed as “lobular carcinoma.” The screening of bosom malignancy is an essential stride which sift through the manifestations that can be utilized to analyze the patient's real obsessive condition. Breast cancer is the most continuous reason for death in more established ladies however in the meantime, it is critical to note that more youthful ladies who don't go under tumor screening process stay in risk hover of breast cancer. In this paper is planned accordingly: the relates works and demonstration of the focused parts of the utilized data mining methods in part 1. The details of the dataset for Breast Cancer in part 2. The experimentation outcome and conversation in part 3. And lastly, conclude the paper and future enhancements. LITERATURE REVIEW A multinomial logistic-regression model with a hill-like estimator generalizes logistic regression by using more than two distinct outcomes between the categorical and multinomial distributions [3].This model is mainly designed to predict the probabilities of different outcomes when using categorically dependent and independent variables. An RBF network is an ANN that uses the K-means clustering algorithm to implement the activation functions and can study both discrete class and numeric class problem. The RBF network generally includes three layers: input, hidden, and output [4]. Nearest Neighbor classification is predominantly used when all attribute values are unbroken, although it can be suitably modified to deal with categorical attributes. The thought is to assess the arrangement of a shrouded case utilizing the characterization of the occurrence or cases that are nearest to it, in some sense that we need to define [5]. The conjunctive rule is based on rule mining algorithm to anticipate numeric and categorical class value. This