Hadoop YARN Capacity Scheduler elasticity

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hadoop YARN Capacity Scheduler elasticity

Frank Fan

Hello,

 

We are confused as to the behavior of how in CapacityScheduler a queue can use idle resources beyond the configured capacity on YARN and were wondering if we could get an explanation. We are running Hadoop 3.0.0-alpha2 with CapacityScheduler submitting Tez 0.9.1 jobs. The cluster consists of 40 nodes, each of which has 15GB memory and 1 vCore, for now all containers use entire node. (min and max allocation are set to be 15GB/1vCore)

 

The documentation for the YARN Capacity Scheduler indicates that “Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.” This seems to imply that the scheduler will allocate any free capacity to a queue that is at 100% of configured capacity up to its configured maximum capacity. However, this does not seem to be the case for us, and the queue is only able to go 1 minimum allocation (15000MB, 1vCore) above its configured capacity (5% of the cluster, 2 nodes) even though each of the applications requests 10 containers (each of 15000 memory, 1 vCore), most of which end up pending due to capacity limits. The maximum capacity is set to 100% and there are clearly 37 free nodes in the cluster. (As shown in the screenshots below)

 

 

Preemption is set to be off across the system. Attached are the configuration files for Capacity Scheduler and YARN.

 

Best,

Weihang (Frank) Fan

Carnegie Mellon University



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

capacity-scheduler.xml (8K) Download Attachment
yarn-site.xml (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hadoop YARN Capacity Scheduler elasticity

Damien Claveau
Hello Franck,

If all the apps are submitted with the user wfan, you are probably hitting the limit of resources allocated to a SINGLE user to 100% of the guaranteed minimum.
Have a look at the documentation for a full explanation of the " yarn.scheduler.capacity.root.default.user-limit-factor " setting,
and consider increasing it to 20 (eg 20 x 5%) if you hope to achieve 100% cluster usage within that single user in a single queue.

Regards,
Damien

2018-05-31 23:42 GMT+02:00 Frank Fan <[hidden email]>:

Hello,

 

We are confused as to the behavior of how in CapacityScheduler a queue can use idle resources beyond the configured capacity on YARN and were wondering if we could get an explanation. We are running Hadoop 3.0.0-alpha2 with CapacityScheduler submitting Tez 0.9.1 jobs. The cluster consists of 40 nodes, each of which has 15GB memory and 1 vCore, for now all containers use entire node. (min and max allocation are set to be 15GB/1vCore)

 

The documentation for the YARN Capacity Scheduler indicates that “Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.” This seems to imply that the scheduler will allocate any free capacity to a queue that is at 100% of configured capacity up to its configured maximum capacity. However, this does not seem to be the case for us, and the queue is only able to go 1 minimum allocation (15000MB, 1vCore) above its configured capacity (5% of the cluster, 2 nodes) even though each of the applications requests 10 containers (each of 15000 memory, 1 vCore), most of which end up pending due to capacity limits. The maximum capacity is set to 100% and there are clearly 37 free nodes in the cluster. (As shown in the screenshots below)

 

 

Preemption is set to be off across the system. Attached are the configuration files for Capacity Scheduler and YARN.

 

Best,

Weihang (Frank) Fan

Carnegie Mellon University



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



--

Damien Claveau
MOBILE
06 60 31 47 84 E-MAIL [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hadoop YARN Capacity Scheduler elasticity

Jasson Chenwei
hi Wei hang,

The resource elasticity is used in multi-queue setting. For example, says you have two queue, each is configured with 50% usage and 100% maximum usage. When queue1 is not fully utilized, the left ideal resource can be used by queue at most to 100%. At single queue  setting as shown in your experiment, the resource elasticity may not be the problem. As suggested by Damien, you can check the  yarn.scheduler.capacity.root.default.user-limit-factor as well as  yarn.scheduler.capacity.maximum-applications / yarn.scheduler.capacity.<queue-path>.maximum-applications

Regards,

Wei 

On Fri, Jun 1, 2018 at 1:04 PM, Damien Claveau <[hidden email]> wrote:
Hello Franck,

If all the apps are submitted with the user wfan, you are probably hitting the limit of resources allocated to a SINGLE user to 100% of the guaranteed minimum.
Have a look at the documentation for a full explanation of the " yarn.scheduler.capacity.root.default.user-limit-factor " setting,
and consider increasing it to 20 (eg 20 x 5%) if you hope to achieve 100% cluster usage within that single user in a single queue.

Regards,
Damien

2018-05-31 23:42 GMT+02:00 Frank Fan <[hidden email]>:

Hello,

 

We are confused as to the behavior of how in CapacityScheduler a queue can use idle resources beyond the configured capacity on YARN and were wondering if we could get an explanation. We are running Hadoop 3.0.0-alpha2 with CapacityScheduler submitting Tez 0.9.1 jobs. The cluster consists of 40 nodes, each of which has 15GB memory and 1 vCore, for now all containers use entire node. (min and max allocation are set to be 15GB/1vCore)

 

The documentation for the YARN Capacity Scheduler indicates that “Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.” This seems to imply that the scheduler will allocate any free capacity to a queue that is at 100% of configured capacity up to its configured maximum capacity. However, this does not seem to be the case for us, and the queue is only able to go 1 minimum allocation (15000MB, 1vCore) above its configured capacity (5% of the cluster, 2 nodes) even though each of the applications requests 10 containers (each of 15000 memory, 1 vCore), most of which end up pending due to capacity limits. The maximum capacity is set to 100% and there are clearly 37 free nodes in the cluster. (As shown in the screenshots below)

 

 

Preemption is set to be off across the system. Attached are the configuration files for Capacity Scheduler and YARN.

 

Best,

Weihang (Frank) Fan

Carnegie Mellon University



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



--

Damien Claveau
MOBILE
06 60 31 47 84 E-MAIL [hidden email]