Predicting the Survivability of Breast Cancer Patients using Ensemble Approach Neha Rathore M.Tech, Soſtware Engineering Indian Institute of Information Technology, Allahabad, India Divya Sonali Agarwal Assistant Professor, Indian Institute of Information Technology, Allahabad, India sonali@iiita.ac.in rathore.knmiet@gmail.com Research Scholar, Indian Institute of Information Technology, Allahabad, India divyatomar26@gmail.com Abstract-Data mining in healthcare is one of most preferable research field in these days. In heaIthcare, data are coming from different sources and are continuously stored in data repositories. Healthcare organization generates vast amount of data which contains useful information. Data Mining is used for uncovering the valuable information from medical data which in turn helpful for mang important decision regarding patient's health. This paper used breast cancer data from SEER (Surveillance of Epidemiology and End Result) which is contributed by National Cancer Institute. The dataset consists data of various types of cancer such as breast, lung, oral cancer etc. The proposed research work first analyzes the breast cancer dataset and then applying data mining approach to evaluate the results. Data Mining is used for getting the patterns of the disease which can be effectively utilized by medical practitioner. For predicting the survivability of breast cancer patients an ensemble classification approach is presented in this paper. Keywords-Data Mining, Healthcare, Ensemble asser, Breast Cancer, SEER. 1. INTRODUCTION In the present scenario, Information Technology (IT) is being applied on every fields of common man life such as Education, Finance, E-Goveance and also the most significantly in Healthcare. Presently, Data mining is a well established way of efficient decision making where various approaches may generate valuable pattes of data. Data Mining can efficiently used in healthcare because healthcare industries are generating huge amount of data and lacking intelligent decision tool for correct, timely and effective decision making. The scope of healthcare services are very broad and they involve various categories of data which is ever increasing so there is an urgent demand of efficient storage, quick processing and other data handling techniques. This may collect huge and complex data which need to be analyzed and identi interesting pattes and hidden information om them. Thus, this include requirement of new tools and techniques for extracting usel information. Cancer is one of the perilous diseases. Figure 1 represents the cancer scenario in India indicated in the span of 6 years from 2004 to 2010 [1]. As indicating in figure 1, there is a continuous growth in cancer cases om 2004 to 2010 and the 978-1-4799-2900-9/14/$31.00 ©2014 IEEE same has been predicted for coming years i.e. 2015 as well as 2020. Fig. I. Yearly distribution of cancer cases in India [I] In another figure 2, the status of cancer scenario in India and USA has been presented and a c omparative analysis has been highlighted. Again, it is noticed that the incident rate of breast cancer is high in USA as compare to India while the incident rate of O ral cancer is high in India [2]. Fig. 2. Comparison of different cancer in India and USA [2] 459