Predicting the Survivability of Breast Cancer
Patients using Ensemble Approach
Neha Rathore
M.Tech, Soſtware Engineering
Indian Institute of Information
Technology, Allahabad, India
Divya Sonali Agarwal
Assistant Professor,
Indian Institute of Information
Technology, Allahabad, India
sonali@iiita.ac.in
rathore.knmiet@gmail.com
Research Scholar,
Indian Institute of Information
Technology, Allahabad, India
divyatomar26@gmail.com
Abstract-Data mining in healthcare is one of most preferable
research field in these days. In heaIthcare, data are coming from
different sources and are continuously stored in data repositories.
Healthcare organization generates vast amount of data which
contains useful information. Data Mining is used for uncovering
the valuable information from medical data which in turn helpful
for mang important decision regarding patient's health. This
paper used breast cancer data from SEER (Surveillance of
Epidemiology and End Result) which is contributed by National
Cancer Institute. The dataset consists data of various types of
cancer such as breast, lung, oral cancer etc. The proposed
research work first analyzes the breast cancer dataset and then
applying data mining approach to evaluate the results. Data
Mining is used for getting the patterns of the disease which can
be effectively utilized by medical practitioner. For predicting the
survivability of breast cancer patients an ensemble classification
approach is presented in this paper.
Keywords-Data Mining, Healthcare, Ensemble asser, Breast
Cancer, SEER.
1. INTRODUCTION
In the present scenario, Information Technology (IT) is
being applied on every fields of common man life such as
Education, Finance, E-Goveance and also the most
significantly in Healthcare. Presently, Data mining is a well
established way of efficient decision making where various
approaches may generate valuable pattes of data. Data
Mining can efficiently used in healthcare because healthcare
industries are generating huge amount of data and lacking
intelligent decision tool for correct, timely and effective
decision making. The scope of healthcare services are very
broad and they involve various categories of data which is ever
increasing so there is an urgent demand of efficient storage,
quick processing and other data handling techniques. This may
collect huge and complex data which need to be analyzed and
identi interesting pattes and hidden information om them.
Thus, this include requirement of new tools and techniques for
extracting usel information.
Cancer is one of the perilous diseases. Figure 1 represents
the cancer scenario in India indicated in the span of 6 years
from 2004 to 2010 [1]. As indicating in figure 1, there is a
continuous growth in cancer cases om 2004 to 2010 and the
978-1-4799-2900-9/14/$31.00 ©2014 IEEE
same has been predicted for coming years i.e. 2015 as well as
2020.
Fig. I. Yearly distribution of cancer cases in India [I]
In another figure 2, the status of cancer scenario in India
and USA has been presented and a c omparative analysis has
been highlighted. Again, it is noticed that the incident rate of
breast cancer is high in USA as compare to India while the
incident rate of O ral cancer is high in India [2].
Fig. 2. Comparison of different cancer in India and USA [2]
459