job stops progressing because tasktracker stop taking tasks

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

job stops progressing because tasktracker stop taking tasks

Runping Qi
I was testing a job on a single node hadoop cluster running Hadoo9 0.19.
The single tasktracker has 2 reduce slots.
After finishing 8 reduce tasks out of 17 total reduce tasks, the tasktracker
stopped taking any new tasks.
The job made no more progress then.

Anybody encountered somilar situations?

Thanks,

Runping
Reply | Threaded
Open this post in threaded view
|

Re: job stops progressing because tasktracker stop taking tasks

Scott Carey
What version of 0.19?

There are some bugs that could cause problems like that in 0.19 that are
fixed in the latest release: 0.19.2


On 10/25/09 10:29 PM, "Runping Qi" <[hidden email]> wrote:

> I was testing a job on a single node hadoop cluster running Hadoo9 0.19.
> The single tasktracker has 2 reduce slots.
> After finishing 8 reduce tasks out of 17 total reduce tasks, the tasktracker
> stopped taking any new tasks.
> The job made no more progress then.
>
> Anybody encountered somilar situations?
>
> Thanks,
>
> Runping
>

Reply | Threaded
Open this post in threaded view
|

Re: job stops progressing because tasktracker stop taking tasks

dave bayer
In reply to this post by Runping Qi

On Oct 25, 2009, at 10:29 PM, Runping Qi wrote:

> I was testing a job on a single node hadoop cluster running Hadoo9  
> 0.19.
> The single tasktracker has 2 reduce slots.
> After finishing 8 reduce tasks out of 17 total reduce tasks, the  
> tasktracker
> stopped taking any new tasks.
> The job made no more progress then.
>
> Anybody encountered somilar situations?

I had something similar happen over the weekend. Not sure if this is
exactly what you were seeing:

On a ~20 node cluster running 0.19.2, 90 map slots, 45 reduce slots:

Jobtracker stops scheduling jobs, webUI shows no jobs running or in the
completed/failed list. Didn't check the queue lists (I have 3 queues,  
one
for adhoc, one for nightly production jobs and one for data load  
jobs). This
is using the default JobQueueTaskScheduler scheduler (had tried the
Capacity Scheduler but found that ran into deadlocks from threads  
obtaining
monitors and then calling routines through the reflection API that would
attempt to lock the same monitor).

Jobtracker would accept new jobs, issue IDs, even report the map and  
reduce
status (which would never proceed beyond 0%) but not show these jobs in
the webUI and I do not believe they appeared in the hadoop job -list  
output
which if memory serves, was empty.

Nothing in the logs pointing to problems. Jstack doesn't show  
deadlocks or
any thread really even doing much of anything. Didn't think to attach a
remote debugger to the process til I had restarted it.

Didn't find anything in JIRA that might relate to this. Don't have  
recreation
steps because everything seemed to be 'working' but no progress was
ever made.

dave bayer