International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 1, January 2015) 567 Survey on Data Mining Algorithm and Its Application in Healthcare Sector Using Hadoop Platform K. Sharmila 1 , Dr. S. A.Vethamanickam 2 1 Asst Prof. & Research Scholar, Dept. of Computer Science, Vels University, Chennai, India. 2 Research Advisor, Chennai, India. Abstract- In this survey paper, we have scrutinized and revealed the benefits of Hadoop in the Healthcare sector using data mining where the data flow was in massive volume. In developing countries like India with huge population, there exists various problems in the field of healthcare with respect to the expenses met by the economically underprivileged people, access to the hospitals and research in the field of medicine for Big Data. The Apache Hadoop has become a world-wide adoption and it has brought parallel processing in the hands of average programmer for Big data. It has become imperative to migrate existing data mining algorithms onto Hadoop platform for increased parallel processing efficiency. In this paper, we have surveyed various progress made in the area of data mining technique, its latest adoption in Hadoop platform and Big data, algorithms used in such platform, and listed out the open challenges in using such algorithm in the Indian medicinal data set. Keywords: Hadoop, Data mining, Healthcare, Big data. I. INTRODUCTION In this era of Big data the organizations and health industry are facing problems of three V’s namely Volume, Velocity and Variety in migrating the data over the network for the purpose of transformation or analysis that has become unrealistic. Moving terabytes of data from one system to another often has brought the network administrator infeasible and made the process slow and limited to SAN (Storage Area Network) bandwidth. The distributed processing of huge data sets across groups of systems is facilitated by using a computing model of the Apache Hadoop Framework. The framework was projected to widen from solitary server to thousands of systems for the computation and storage. This forceful feature of Hadoop framework attracts variety of companies and organizations to use it for both research and production. Healthcare is one of the most important areas of developing and developed countries to ease the priceless human resource. Commonwealth Governments have identified various health issues like diabetes as a significant and growing global public health problem. Estimation shows 40 million Indians suffer from diabetes, and the crisis seems to be growing at a shocking rate. By 2020, the number is expected to twice, even though half the numbers of diabetics in India remain undiagnosed due to the massive volume of data. Since healthcare industry nowadays has flooded with massive amount of data, need validation and accurate analysis. Even though Big Data Analytics and Hadoop can contribute a major role in processing and analyzing the healthcare data in variety of forms to deliver suitable applications and in turn reduces the cost of services to a common man in the country, there are some open challenges to conquer which are explicitly stated in this survey paper. II. DATA MINING Data mining is the core step, which has resulted in the discovery of hidden but useful knowledge from massive databases. “It is the non-trivial extraction of previously unknown and useful information about data”. It can also be defined as “the science of extracting useful information from large databases”. The two primary goals of data mining are prediction and description. Prediction involves some variables or fields in the data set to predict unknown or future values of other variables of interest. Description focuses on finding patterns describing the data that can be interpreted by humans. Categorization of Data Mining Techniques: 2.1 Data mining algorithms falls under 4 classes a. Association rule learning: This category of algorithms search is for relation between variables. This is used for application like knowing the frequently visited items.