Tuesday, February 19, 2013

What is MapReduce

MapReduce published in 2004 by google.Hadoop can run MapReduce programs written in various languages like Java, Ruby, Python, and C++.It consists of two phases: Map and then Reduce and two stage known as the shuffle and sort.Map tasks work on relatively small portions of data – typically a single HDFS block.MapReduce is a method for distributing a task across multiple nodes.Features of MapReduce is to automatic parallelization and distribution.

Shuffle and Sort
MapReduce makes the guarantee that the input to every reducer is sorted by key.It is known as the shuffle.
Counters
Counters are a useful channel for gathering statistics about the job: for quality control or for application level-statistics.
Secondary Sort
The MapReduce framework sorts the records by key before they reach the reducers.

No comments:

Post a Comment