The Future of Big Data: the Answer is Not “42”

Analytics

The Future of Big Data: the Answer is Not “42”

Which Sorting Algorithm is used in map-reduce Hadoop?


Accepted Solutions

algorithm hadoop and mapreduce answer

In mapreduce genrates intermediate key value pairs which are sorted automatically by the keys. the features can be applied program that required sorting some stage . sorting helps the reducer distiguish when a new reducer task should start this save time for reducer . reducer start a new task when the next key in the sorted input data is different than the previous. each reducer task takes a key -list value pairs. 

sort algorithm 

mapper quick sort is used map side . org.apache .hadoop,util,Quick sort class is used for sorting keys. 

 

Reducer Merge sort is used in reduce side . Merge sort is a default feature of mapreduce one cannot change the mapreduce  sorting method the reason is that data comes from the differnt nodes to a single point , so the best algorithm that can be used merge sort. 

Tags (1)
1 ACCEPTED SOLUTION
3 REPLIES
Teradata Employee

Re: The Future of Big Data: the Answer is Not “42”

There's a thorough explanation here:http://community.teradata.com/forums/topic/hadoop-mapreduce-which-sort-algorithm-is-used

 

But it looks like Quick Sort is used in the Map step and Merge Sort in the Reduce step.

 

Re: The Future of Big Data: the Answer is Not “42”

Sorting is done at reducer node based on keys. Mappers class takes help of writable comparator class to sort the key -value pair genrated from reducer . writable comparator class implements java's Raw Comparator interface writable class's compare method is responsible for sorting of key value pair performed by byte comparison key .

Merge sort algorithm is used by default in mapreduce we can't change sorting algorithm as the data comes from different

algorithm hadoop and mapreduce answer

In mapreduce genrates intermediate key value pairs which are sorted automatically by the keys. the features can be applied program that required sorting some stage . sorting helps the reducer distiguish when a new reducer task should start this save time for reducer . reducer start a new task when the next key in the sorted input data is different than the previous. each reducer task takes a key -list value pairs. 

sort algorithm 

mapper quick sort is used map side . org.apache .hadoop,util,Quick sort class is used for sorting keys. 

 

Reducer Merge sort is used in reduce side . Merge sort is a default feature of mapreduce one cannot change the mapreduce  sorting method the reason is that data comes from the differnt nodes to a single point , so the best algorithm that can be used merge sort. 

Tags (1)