International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 ISSN: 2347-8578 www.ijcstjournal.org Page 79 Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1] , Dr.S.Sivakumar [2] Research Scholar [1] , Assistant professor [2] Department of Computer Science [1] Department of Computer Applications [2] Thanthai Hans Roever College, Perambalur Tamil Nadu - India ABSTRACT Cancer is a big issue all approximately the world. It is a disease, which is mortal in many cases and has affected the lives of many and will continue to affect the lives of many more. The most effective way to reduce cancer deaths is to detect it earlier. Early diagnosis needs an accurate and reliable diagnosis procedure that can be used by physicians to distinguish benign cancer from malignant ones without going for surgical biopsy. The objective of these predictions is to assign patients to one of the two group either a “benign” that is noncancerous or a “malignant” that is cancerous. The calculation problem is the lasting care for the virus for patients whose cancer has been surgically removed. Predicting the outcome of a disease is one of the most attractive and challenging tasks where to enlarge data mining applications. The objective of this paper is to predict the presence of two types of life threatening diseases such as Leukemia and Breast cancers by analyzing the clinical datasets. Naïve Bayes and Support Vector Machine prediction models are built for the prediction classification. The performance of the models is then compared in terms of accuracy, time complexity and iterations. Keywords:- Cancer, naive Bayes, support vector Machine I. INTRODUCTION Medical Data Mining (MDM) deals with the problem of scientific decision-making for the diagnosis and treatment of a disease by extracting useful knowledge from large medical databases. Clinical databases have large quantity of in order about patients and their medical circumstances. Relationships and patterns within this data could provide new medical knowledge. Data mining methods assist physicians in many ways right from the understanding of compound diagnostic tests, merging information from multiple sources and providing support for differential diagnosis and providing patient-specific prognosis. Classification algorithms of data mining often used in the prediction are medical data analysis. Many researchers have been working on improving the presentation of presented algorithms in terms of minimizing the time taken to build the model and maximizing the prediction accuracy of the proposed model. II. REVIEW OF LITERATURE Saleema et al [1] found the effect of sampling techniques in classifying the prognosis variable and proposed an ideal example method based on the result of the testing. They compared three example techniques: random, stratified, and balanced stratified. The model has been tested with the SEER data sets. The SEER public use cancer database provides various prominent class labels for prognosis prediction. The categorization model for experimentation had been built using the breast cancer, respiratory cancer and mixed cancer data sets with three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three prediction factors survival, period and metastasis had been used as class labels for experimental comparisons. The results showed a steady increase in the prediction accuracy of balanced stratified model as the sample size increases, but the traditional approach fluctuates before the optimum results. Kaishi Li, et,al.,[2] discussed the feature extraction of microarray genes has a greater impact on its categorization and clustering as it is taken as input to any network. The use of gene appearance data in discriminating two types of very similar cancers acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) presented in Classification results are reported in using methods other than neural networks. This paper explores the role of the feature vector in classification. In order to achieve best results in knowledge algorithm, feature RESEARCH ARTICLE OPEN ACCESS