Mudgil Pooja et al.; International Journal of Advance Research, Ideas and Innovations in Technology
© 2019, www.IJARIIT.com All Rights Reserved Page | 424
ISSN: 2454-132X
Impact factor: 4.295
(Volume 5, Issue 3)
Available online at: www.ijariit.com
Breast cancer prediction algorithms analysis
Pooja Mudgil
engineer.pooja90@gmail.com
Bhagwan Parshuram Institute of
Technology, New Delhi, Delhi
Mohit Garg
mohitgarg701@gmail.com
Bhagwan Parshuram Institute of
Technology, New Delhi, Delhi
Vaibhav Chhabra
vaibhavchhabra97@gmail.com
Bhagwan Parshuram Institute of
Technology, New Delhi, Delhi
Parikshit Sehgal
parikshit4497@gmail.com
Bhagwan Parshuram Institute of
Technology, New Delhi, Delhi
Jyoti
jyoti.mj868@gmail.com
Bhagwan Parshuram Institute of
Technology, New Delhi, Delhi
ABSTRACT
Machine learning which an application of Artificial
intelligence (AI) is makes the system capable to automatically
learn through the environment without being explicitly
programmed. It is widely used in various domains like
classification and prediction processes. This paper basically
compares classifier algorithms like-Naïve Bayes, K Nearest
Neighbour, Decision tree, Logistic Regression, Random
Forest, Support Vector Machine (SVM). These algorithms
predict chances of breast cancer and are programmed in
python language. The implementation procedure shows that
the performance of any classification algorithm is based on the
type of attributes of datasets and their characteristics. The
main aim of this paper is to do the comparison of these
algorithms on the basis of the accuracy. The goal is to classify
whether breast cancer is “Benign” or “Malignant”.
Keywords— Machine learning, Classification, Naïve Bayes, K
Nearest Neighbour, Decision tree, Logistic regression, Random
forest
1. INTRODUCTION
Breast cancer (BC) is considered as the most common cancers,
resulting majority of new cancer cases and cancer-related deaths
according to global statistics, making it a significant public
health problem in today’s society [1]. The early diagnosis of BC
can improve the prognosis and chance of survival significantly,
as it can promote timely clinical treatment to patients. Benign
Tumours can be classified in such a way that can prevent
patients. So, the diagnosis of BC and the classification of patients
into malignant or benign is really a matter of concern. Because
of its unique advantages in critical features detection from
complex BC datasets, machine learning (ML) is widely
recognized as the methodology of choice in BC pattern
classification and forecast modelling. Especially in the medical
field, where those methods are widely used in diagnosis and
analysis to make decisions. There are various risk factors for BC
such as Age, family history, genetic factors, menstrual or
childbearing history, etc. The most important screening test that
is Mammogram is an X-ray of the breast, it can detect the risk of
cancer 2 years before any doctor or felt by the patient, it should
be done by women of 40-45 years once in a year. All the
algorithms which are taken have their own strengths and
weaknesses based on the type of data input and tools which are
used for the implementation of the algorithms. There are various
tools which are available for implementing machine learning.
Scikit-learn is a very powerful python library which features
various classification, regression and clustering algorithms
including support vector machines, random forests, gradient
boosting, k-means.
2. CLASSIFICATION ALGORITHMS
2.1 Naive Bayes
This model comes under the classification technique. It is based
on the Bayes Theorem which is in probability with an
assumption that independence between predictors will be there.
A Naive Bayes classifier assumes that the presence of a specific
feature in a class is not related to the presence of any other feature
[2]. It is a probability or statistical-based approach which comes
under the concept of supervised learning. In this basically
guessing part is done for example a diagnosis done by a doctor.
Bayes theorem tells the probability of an event based on
conditions which are priory known that might be related to that
event. Mathematically (E/F) = (P(F/E)*P(E))/P(F). P(E/F) is the
probability of E occurring if F has already occurred. The
Bayesian theorem is proved better than any other probabilistic
approach this is the reason it is used in machine learning. This
model is easy to build and it is beneficial where a large set of
data is present. A dependency graph is first made in a Naive
Bayes model after that implementation is done. For example,
Fig. 1: Example to show a dependency graph