Mudgil Pooja et al.; International Journal of Advance Research, Ideas and Innovations in Technology © 2019, www.IJARIIT.com All Rights Reserved Page | 424 ISSN: 2454-132X Impact factor: 4.295 (Volume 5, Issue 3) Available online at: www.ijariit.com Breast cancer prediction algorithms analysis Pooja Mudgil engineer.pooja90@gmail.com Bhagwan Parshuram Institute of Technology, New Delhi, Delhi Mohit Garg mohitgarg701@gmail.com Bhagwan Parshuram Institute of Technology, New Delhi, Delhi Vaibhav Chhabra vaibhavchhabra97@gmail.com Bhagwan Parshuram Institute of Technology, New Delhi, Delhi Parikshit Sehgal parikshit4497@gmail.com Bhagwan Parshuram Institute of Technology, New Delhi, Delhi Jyoti jyoti.mj868@gmail.com Bhagwan Parshuram Institute of Technology, New Delhi, Delhi ABSTRACT Machine learning which an application of Artificial intelligence (AI) is makes the system capable to automatically learn through the environment without being explicitly programmed. It is widely used in various domains like classification and prediction processes. This paper basically compares classifier algorithms like-Naïve Bayes, K Nearest Neighbour, Decision tree, Logistic Regression, Random Forest, Support Vector Machine (SVM). These algorithms predict chances of breast cancer and are programmed in python language. The implementation procedure shows that the performance of any classification algorithm is based on the type of attributes of datasets and their characteristics. The main aim of this paper is to do the comparison of these algorithms on the basis of the accuracy. The goal is to classify whether breast cancer is “Benign” or “Malignant”. Keywords— Machine learning, Classification, Naïve Bayes, K Nearest Neighbour, Decision tree, Logistic regression, Random forest 1. INTRODUCTION Breast cancer (BC) is considered as the most common cancers, resulting majority of new cancer cases and cancer-related deaths according to global statistics, making it a significant public health problem in today’s society [1]. The early diagnosis of BC can improve the prognosis and chance of survival significantly, as it can promote timely clinical treatment to patients. Benign Tumours can be classified in such a way that can prevent patients. So, the diagnosis of BC and the classification of patients into malignant or benign is really a matter of concern. Because of its unique advantages in critical features detection from complex BC datasets, machine learning (ML) is widely recognized as the methodology of choice in BC pattern classification and forecast modelling. Especially in the medical field, where those methods are widely used in diagnosis and analysis to make decisions. There are various risk factors for BC such as Age, family history, genetic factors, menstrual or childbearing history, etc. The most important screening test that is Mammogram is an X-ray of the breast, it can detect the risk of cancer 2 years before any doctor or felt by the patient, it should be done by women of 40-45 years once in a year. All the algorithms which are taken have their own strengths and weaknesses based on the type of data input and tools which are used for the implementation of the algorithms. There are various tools which are available for implementing machine learning. Scikit-learn is a very powerful python library which features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means. 2. CLASSIFICATION ALGORITHMS 2.1 Naive Bayes This model comes under the classification technique. It is based on the Bayes Theorem which is in probability with an assumption that independence between predictors will be there. A Naive Bayes classifier assumes that the presence of a specific feature in a class is not related to the presence of any other feature [2]. It is a probability or statistical-based approach which comes under the concept of supervised learning. In this basically guessing part is done for example a diagnosis done by a doctor. Bayes theorem tells the probability of an event based on conditions which are priory known that might be related to that event. Mathematically (E/F) = (P(F/E)*P(E))/P(F). P(E/F) is the probability of E occurring if F has already occurred. The Bayesian theorem is proved better than any other probabilistic approach this is the reason it is used in machine learning. This model is easy to build and it is beneficial where a large set of data is present. A dependency graph is first made in a Naive Bayes model after that implementation is done. For example, Fig. 1: Example to show a dependency graph