Bahman Rashidi et al./ Elixir Comp. Sci. & Engg. 53 (2012) 12059-12064 12059 Introduction The amount of data in our world has been exploding, and analysing large data sets called big data, will become a key basis of many researches. Data is being collected and stored at unprecedented rates. The challenge is not only to store and manage the vast volume of data (“big data”), but also to analyse and extract meaningful value from it. There are several approaches to collecting, storing, processing, and analysing big data. MapReduce is one of existing mechanisms for big data processing. MapReduce is a distributed programming framework designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. The MapReduce distributed data analysis framework model introduced by Google provides an easy-to-use programming model that features fault tolerance, automatic parallelization, scalability and data locality-based optimizations. Due to their excellent fault tolerance features, MapReduce frameworks are well-suited for the execution of large distributed jobs in brittle environments such as commodity clusters and cloud infrastructures [5][12]. Hadoop MapReduce provides a mechanism for programmers to leverage the distributed systems for processing data sets. MapReduce can be divided into two distinct phases: Map Phase: Divides the workload into smaller sub workloads and assigns tasks to Mapper, which processes each unit block of data. The output of Mapper is a sorted list of (key, value) pairs. This list is passed (also called shuffling) to the next phase. Reduce: analyses and merges the input to produce the final output. The final output is written to the HDFS in the cluster. Cloud computing is a new paradigm for the provision of computing infrastructure. This paradigm shifts the location of this infrastructure to the network to reduce the costs associated with the management of hardware and software resources. Hence, businesses and users become able to access application services from anywhere in the world [11]. Characteristics of cloud services like On-demand self- service, Broad network access, Resource pooling, Rapid elasticity and Measured Service cause MapReduce take advantage of cloud infrastructure services and probably cloud is a good platform for implementation of MapReduce [3] [11]. In this paper we bring out a complete comparison of the two different implementations of MapReduce programming model that implemented on top of cloud computing. The rest of the paper is organized as follows. The cloud computing and cloud service models are briefly explained. Also the MapReduce and his architecture are briefly explained and the characteristics of MapReduce implementation in the cloud environment.At last discusses and compares two models of cloud MapReduce. Concluding remarks are presented. Cloud Computing The concept of cloud computing addresses the next evolutionary step distributed computing. The goal of this computing model is to make a better use of distributed resources, put them together in order to achieve higher throughput and be able to tackle large scale computation problems. Cloud computing is not a completely new concept for the development and operation of web application. It allows for the most cost-effective development of scalable web portals on highly available and fail-safe infrastructure [1]. Cloud computing deals with virtualization, scalability, interoperability, quality of service and the delivery models of the cloud, namely private, public and hybrid. A more structured definition is given by Buyya et al [2]: who define a Cloud as a “type of parallel and distributed system consisting of a collection of interconnected and Tele: E-mail addresses: b_rashidi@comp.iust.ac.ir © 2012 Elixir All rights reserved A Comparison of Amazon Elastic Mapreduce and Azure Mapreduce Bahman Rashidi 1 , Esmail Asyabi 1 and Talie Jafari 2 1 Iran University of Science and Technology (IUST). 2 Amirkabir University of Technology. ABSTRACT In last two decades continues increase of comput-ational power and recent advance in the web technology cause to provide large amounts of data. That needs large scale data processing mechanism to handle this volume of data. MapReduce is a programming model for large scale distributed data processing in an efficient and transparent way. Due to its excellent fault tolerance features, scalability and the ease of use. Currently, there are several options for using MapReduce in cloud environments, such as using MapReduce as a service, setting up one’s own MapReduce cluster on cloud instances, or using specialized cloud MapReduce runtimes that take advantage of cloud infrastructure services. Cloud computing has recently emerged as a new paradigm that provide computing infrastructure and large scale data processing mechanism in the network. The cloud is on demand, scalable and high availability so implement of MapReduce on the top of cloud services cause faster, scalable and high available MapReduce framework for large scale data processing. In this paper we explain how to implement MapReduce in the cloud and also have a comparison between implementations of MapReduce on AzureCloud, Amazon Cloud and Hadoop at the end. © 2012 Elixir All rights reserved. ARTICLE INFO Article history: Received: 18 October 2012; Received in revised form: 7 December 2012; Accepted: 14 December 2012; Keywords Cloud computing. Mapreduce. Cloud mapreduce. Azure mapreduce. Amazon elastic mapreduce. Elixir Comp. Sci. & Engg. 53 (2012) 12059-12064 Computer Science and Engineering Available online at www.elixirpublishers.com (Elixir International Journal)