Critical Study of Hadoop Implementation and Performance Issues Madhavi Vaidya Asst. Professor, Dept of Computer Sc. Vivekanand College, Mumbai, India Dr. Shriniwas Deshpande Associate Professor, Head of PG Dept of Computer Science & Technology, DCPE, HVPM, Amravati, India Abstract The MapReduce model has become an important parallel processing model for large- scale data-intensive applications like data mining and web indexing. Hadoop, an open- source implementation of MapReduce, is widely applied to support cluster computing jobs requiring low response time. The different issues of Hadoop are discussed here and then for them what are the solutions which are proposed in the various papers which are studied by the author are discussed here. Finally, Hadoop is not an easy environment to manage. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Network delays due to data movement during running time have been ignored in the recent Hadoop research. Unfortunately, both the homogeneity and data locality assumptions in Hadoop are optimistic at best and unachievable at worst, introduces performance problems in virtualized data centers. The analysis of SPOF existing in critical nodes of Hadoop and proposes a metadata replication based solution to enable Hadoop high availability. The goal of heterogeneity can be achieved by a data placement scheme which distributes and stores data across multiple heterogeneous nodes based on their computing capacities. Analysts said that IT using the technology to aggregate and store data from multiple sources can create a whole slew of problems related to access control and ownership. Applications analyzing merged data in a Hadoop environment can result in the creation of new datasets that may also need to be protected. Keywords : Fault , Distributed, HDFS, NameNode Introduction The phenomenal growth of internet based applications and web services in last decade have brought a change in the mindset of researchers. The traditional technique to store and analyze voluminous data has been improved. The organizations are ready to acquire solutions which are highly reliable. [1] Behavior information of the web users are concealed in the web log. The web log mining can find characteristics and rules of the users’ visiting behavior to improve the service quality to users. Clustering is one of technologies of data mining which applied by web log mining. Applying clustering to analysis users’ visiting behavior can realize clustering of users according to their interest, thus it will help us to improve the web site’s structure.[2] Several system architectures have been implemented for data-intensive computing and large-scale data analysis, such as applications including parallel and distributed relational database management systems. As a platform of computing and storage, availability of Hadoop is the foundation do applications availability on it. It is necessary to keep full time availability of platform for product environment. Hadoop has tried some methods to enhance the availability of applications running on it, e.g. maintaining multiple replicas of application data and redeploying application tasks based on failures, but it doesn’t provide high availability for itself. In the architecture of Hadoop, there exists SPOF (Single Point of Failure), which means the whole system gives up and becomes out of work caused by the failure of critical node where only a single copy is kept.[1,2] MapReduce proposed by Google is a programming model and an associated implementation for large-scale data processing in distributed cluster. In the first stage a Map function is applied in parallel to each partition of the input data, performing the