International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 02 Issue: 03 | June-2015 www.irjet.net p-ISSN: 2395-0072
© 2015, IRJET.NET-All Rights Reserved Page 1623
AN INTRODUCTION TO MAP REDUCE APPROACH
TO DISTRIBUTE WORK USING NEW SET OF TOOLS
Mr. Narahari Narasimhaiah
1
, Dr. R. Praveen Sam
2
.
1
Research scholar, Bharathiar University, Tamilnadu, India.
2
Professor, Dept. of CSE. Andhra Pradesh, India.
---------------------------------------------------------------------------------------------------------------------------------------------------------
ABSTRACTION- Using a specialized query language on
highly structured and optimiazed data structures, all
processing happens after the information has been loaded
into the stores to the traditional relational database[3]
world. Across many machines with the computation
spread, with intermediate results being passed between
stages as files. Instead create a pipeline that reads and
writes to arbitrary file formats, this is the
Googleapproach, and adopted by many web companies.
Typically based around the Map- Reduce[1] approach to
distributing work, this approach requires a whole new set
of tools, which I’ll describe below.
Key Words: Relational, Distributing, Map Reduce etc…
1. INTRODUCTION
Fig - 1:Mappers and reducers functionality.
Figure 1 is showing, the map and reduce combinatory
concept in functional programming languages such as Lisp of
the Map Reduce model. In Lisp, a map takes as a sequence of
values and input a function. It applies the function to each
value in the sequence. All the elements of a sequence using a
binary operation combines inreduce. In 2004 Google
introduced the Map Reduce framework to support
distributedprocessing on large data sets distributed over
clusters of computers. Currently it is an integral[3] part of
the Hadoop ecosystem, but it was implemented by many
software platforms.
Map Reduce was introduced and specifically designed to run
on commodity hardware and to solve large-data
computational problems. The input data sets, based on
divide-and-conquer[4] principles, these are split into
independent chunks, which are processed by the mappers in
parallel.
Fig -2:Map Reduce Process.
Based on the user-supplied code, to provide the overall
coordination of execution is the responsibility of the Map
Reduce framework. This includes choosing appropriate
machines (nodes) for running mappers; choosing
appropriate locations for the reducer’s execution; starting
and monitoring[2] the mappers execution; sorting[3] and
shuffling output of mappers and delivering the output to
reducer nodes; and starting and monitoring the reducer’s
execution.
2. FUNCTIONAL PROGRAMMINGCONCEPTS
MapReduce programs are designedin a parallel fashion to
compute large volumes of data. Across a large number of
machines, this requires dividing the workload. If the
components[1] were allowed to share data arbitrarily, this
model would not scale to large clusters (hundreds or