We are confused as to the behavior of how in CapacityScheduler a queue can use idle resources beyond the configured capacity on YARN and were wondering if we could get an explanation. We are running Hadoop 3.0.0-alpha2 with CapacityScheduler submitting Tez 0.9.1 jobs. The cluster consists of 40 nodes, each of which has 15GB memory and 1 vCore, for now all containers use entire node. (min and max allocation are set to be 15GB/1vCore)
The documentation for the YARN Capacity Scheduler indicates that “Applications in the queue may consume more resources than the queue’s capacity if there are free resources, providing elasticity.” This seems to imply that the scheduler will allocate any free capacity to a queue that is at 100% of configured capacity up to its configured maximum capacity. However, this does not seem to be the case for us, and the queue is only able to go 1 minimum allocation (15000MB, 1vCore) above its configured capacity (5% of the cluster, 2 nodes) even though each of the applications requests 10 containers (each of 15000 memory, 1 vCore), most of which end up pending due to capacity limits. The maximum capacity is set to 100% and there are clearly 37 free nodes in the cluster. (As shown in the screenshots below)
Preemption is set to be off across the system. Attached are the configuration files for Capacity Scheduler and YARN.
Weihang (Frank) Fan
Carnegie Mellon University
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
If all the apps are submitted with the user wfan, you are probably hitting the limit of resources allocated to a SINGLE user to 100% of the guaranteed minimum.
Have a look at the documentation for a full explanation of the " yarn.scheduler.capacity.root.default.user-limit-factor " setting,
and consider increasing it to 20 (eg 20 x 5%) if you hope to achieve 100% cluster usage within that single user in a single queue.
2018-05-31 23:42 GMT+02:00 Frank Fan <[hidden email]>:
hi Wei hang,
The resource elasticity is used in multi-queue setting. For example, says you have two queue, each is configured with 50% usage and 100% maximum usage. When queue1 is not fully utilized, the left ideal resource can be used by queue at most to 100%. At single queue setting as shown in your experiment, the resource elasticity may not be the problem. As suggested by Damien, you can check the yarn.scheduler.capacity.root.
On Fri, Jun 1, 2018 at 1:04 PM, Damien Claveau <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|