Does the map task push map output to reduce task or reduce task pull it from map task

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Does the map task push map output to reduce task or reduce task pull it from map task

Jeff Zhang
Hi all,

I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?

Thank you very much.


Jeff zhang
Reply | Threaded
Open this post in threaded view
|

Re: Does the map task push map output to reduce task or reduce task pull it from map task

Prabhu Hari Dhanapal
Well ,I m not sure But I think it might be the  pull.. because physically
the mappers and the reducers are the same nodes ,So if the Mappers  had to
push , it might be the case that all nodes are mapping and there are no
reducers  to  accept it. May be for this  reason ,unless all of the Mapper
tasks are finished, the reducers might not want to start  reducing  anything
@all..

There is also this sort shuffle layer between  maping and  reducing , it
 clearly demarcates the phases.. whihc seem to suggest that its the pull
rather than the push ..

You might think of this as a performance bottle neck, but in reality it
seems it isnt .

btw, Wait for some expert to answer, I m  a  beginner too !

On Mon, Oct 26, 2009 at 9:05 PM, Jeff Zhang <[hidden email]> wrote:

> Hi all,
>
> I'd like to know does the map task push map output to reduce task or reduce
> task pull it from map task ? Which way is real in hadoop ?
>
> Thank you very much.
>
>
> Jeff zhang
>



--
Hari
Reply | Threaded
Open this post in threaded view
|

Re: Does the map task push map output to reduce task or reduce task pull it from map task

dave bayer
In reply to this post by Jeff Zhang

On Oct 26, 2009, at 6:05 PM, Jeff Zhang wrote:

> I'd like to know does the map task push map output to reduce task or  
> reduce
> task pull it from map task ? Which way is real in hadoop ?

In 0.19, it appears to be a pull. Look at the run() method in mapred/
org/apache/hadoop/mapred/ReduceTask.java. Don't
know what the equivalent would be in the mapreduce package
in 0.20.x.

dave bayer
Reply | Threaded
Open this post in threaded view
|

Re: Does the map task push map output to reduce task or reduce task pull it from map task

Amogh Vasekar
In reply to this post by Jeff Zhang
Hi,
Reduce task looks at  map tasks for the partition it requires, and pulls it ( the number of parallel copies is controlled by reduce.parallel.copies ). As partitions are taken in by reduce task, it performs a merge sort, this forms your S&S phase. Typically your mappers / reducers are O(n) , S&S is O(nlogn), so if the amount of intermediate data is huge you will see a relative drop in performance.

Amogh


On 10/27/09 6:35 AM, "Jeff Zhang" <[hidden email]> wrote:

Hi all,

I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?

Thank you very much.


Jeff zhang

Reply | Threaded
Open this post in threaded view
|

Re: Does the map task push map output to reduce task or reduce task pull it from map task

Jothi Padmanabhan
In reply to this post by dave bayer

> Don't
> know what the equivalent would be in the mapreduce package
> in 0.20.x.
>
> dave bayer
>  
The framework code to do with fetching of map outputs is the same for
both the mapred and mapreduce based reducers.