Healthcare Analysis in Smart Big Data Analytics :Reviews, Challenges and Recommendations Ahmed Ismail 1 Abdulaziz Shehab 2 , I. M. El-Henawy 3 1 Information Systems Department, Faculty of Computers and Information, Mansoura University, Egypt. 2 Information Systems Department, Faculty of Computers and Information, Mansoura University, Egypt. 3 Faculty of Computer Science and information Systems, Zagazig University, Egypt * Scientific Research group in Egypt (SRGE) ABSTRACT — Increasing demand and costs for healthcare is a challenge because of the high populations and the difficulty to cover all patients by the available doctors. The healthcare data processing and management became a challenge because the problems with the data itself like irregularity high-dimensionality, and sparsity. A number of researchers worked on these problems and provided some efficient and scalable healthcare solutions. we present the algorithms and systems for healthcare analytics and applications and some related solutions. The solution what we propose is depending on adding a new layer as middleware between the sources of heterogeneous data and the Map reduce Hadoop cluster. The solution solved the common problems of dealing with heterogeneous data effectively. Keywords— Healthcare System, Hadoop Map Reduce, Data analytics, Big Data, Machine learning, IoT 1 Introduction The high volume of heterogeneous medical data has become available in various healthcare organizations and sensors (e.g. wearable devices). The Electronic Health Record (EHR) is any record which supports medical practice or supports healthcare aspects. The benefits may include earlier disease detection, more accurate prognosis, and faster clinical research advance and better management for the patients. The main problems to get value from dealing with big data are complexity, heterogeneity, timeliness, noise and incompleteness. Big Healthcare Analytics is no different in general. There are some steps must be done on the HER information such as collection, integration, cleaning, storing, analysis and interpreted in an optimal manner. The whole process of analyzing the data pipeline where different algorithms or systems focus on different specific targets and are coupled together to deliver an end-to-end solution. The view can be such as a software stack where in each phase there are multiple solutions and the actual choice based on the data type [1]. The EHR is one of two types, sensor data electronic medical records (EMR). Only one of two directions of sensor data and EMR data can be chosen. One direction is to understanding the basic EMR from hospitals. The second direction is to use sensors technologies such as wearable devices, and smart phones by getting more medical related data sources [2]. EMR data is usually collected from hospitals and then analyzed to give valuable information. EMR data is timestamp data which collects patients’ data. EMR data is defined with heterogeneous features of medical historical data of patients such as diagnoses, medications, lab tests, unstructured text data (i.e. doctors notes), images (i.e., magnetic resonance imaging (MRI) data). The EMR data can be used as a useful tool for diseases classification, modelling of disease progression, phenotyping [3]. Although the EMR data is a useful support for healthcare application, it suffers from a great challenge. The first challenge, EMR data is a high dimensional because it contains a large number of medical features. Second issue, EMR data is often dirty or incomplete due to the collection process is collected over a disconnected time and long period of time [4]. Third issue, EMR data is irregular because the patients usually visit the hospitals only when it is necessary only. The sensors data is very important