What if the configured node memory in yarn-site.xml is more than node's physical memory?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

What if the configured node memory in yarn-site.xml is more than node's physical memory?

wuchang
My yarn queue is using FairScheduler as my scheduler for my 4 queues, below is my queue configuration:
<allocations> <queue name="highPriority"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>250000 mb, 100 vcores</maxResources> </queue> <queue name="default"> <minResources>50000 mb, 20 vcores</minResources> <maxResources>100000 mb, 50 vcores</maxResources> <maxAMShare>-1.0f</maxAMShare> </queue> <queue name="ep"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>300000 mb, 100 vcores</maxResources> <maxAMShare>-1.0f</maxAMShare> </queue> <queue name="vip"> <minResources>30000 mb, 20 vcores</minResources> <maxResources>60000 mb, 50 vcores</maxResources> <maxAMShare>-1.0f</maxAMShare> </queue> <fairSharePreemptionTimeout>300</fairSharePreemptionTimeout> </allocations>


Obviously , I didn’t configure any preemption , so , the total cluster resource usage is very low , but , everything  is at least  running OK except that the 

total  resource usage rate of my cluster is not very high.

 So , I decide to turn on preemption and modify the fair-scheduler.xml like below:

<allocations> <queue name="highPriority"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>300000 mb, 100 vcores</maxResources> <weight>0.35</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>18</maxRunningApps> </queue> <queue name="default"> <minResources>50000 mb, 20 vcores</minResources> <maxResources>140000 mb, 70 vcores</maxResources> <weight>0.14</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>20</maxRunningApps> </queue> <queue name="ep"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>600000 mb, 100 vcores</maxResources> <weight>0.42</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>20</maxRunningApps> </queue> <queue name="vip"> <minResources>6000 mb, 20 vcores</minResources> <maxResources>120000 mb, 30 vcores</maxResources> <weight>0.09</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>10</maxRunningApps> </queue> </allocations>

Yes , after preemption is turned on , the total resource usage rate of my cluster is up to 90%+ , but  after one night(midnight is the busiest time for my yarn cluster) , I find that many 
applications delays. 

After a long time of trouble-shooting, I find that in my 9 machine cluster, 5 has physical memory of 128G, and the left 4 machine has pythical memory 64G, 
but all their yarn-site.xml , the  yarn.nodemanager.resource.memory-mb is configured as 97280 ,that is to say , the  yarn.nodemanager.resource.memory-mb  configuration in 4 machines is actually more that the actual pythical memory . So ,I doubt if this is what result in the 
phenomenon that even though the total cluster resource usage is improves, but each application takes more time to execute and delayed seriously.


Any suggestions?
Reply | Threaded
Open this post in threaded view
|

Re: What if the configured node memory in yarn-site.xml is more than node's physical memory?

Eric Payne-3
Hi Wu. If yarn.nodemanager.resource.memory-mb is greater than the amount of memory on a specific node, the scheduler will assign more containers to that node than probably should be running there. They will still run, but it will cause a lot of disk swapping, which will slow down each task running on that node.

I don't know much about the FairScheduler's preemption, but if preemption is aggressive, it could potentially kill more containers than are necessary, which causes the app to lose work that has to be redone.


From: wuchang <[hidden email]>
To: [hidden email]
Cc: Chang. Wu <[hidden email]>
Sent: Monday, May 22, 2017 11:32 PM
Subject: What if the configured node memory in yarn-site.xml is more than node's physical memory?

My yarn queue is using FairScheduler as my scheduler for my 4 queues, below is my queue configuration:
<allocations> <queue name="highPriority"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>250000 mb, 100 vcores</maxResources> </queue> <queue name="default"> <minResources>50000 mb, 20 vcores</minResources> <maxResources>100000 mb, 50 vcores</maxResources> <maxAMShare>-1.0f</maxAMShare> </queue> <queue name="ep"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>300000 mb, 100 vcores</maxResources> <maxAMShare>-1.0f</maxAMShare> </queue> <queue name="vip"> <minResources>30000 mb, 20 vcores</minResources> <maxResources>60000 mb, 50 vcores</maxResources> <maxAMShare>-1.0f</maxAMShare> </queue> <fairSharePreemptionTimeout>300</fairSharePreemptionTimeout> </allocations>


Obviously , I didn’t configure any preemption , so , the total cluster resource usage is very low , but , everything  is at least  running OK except that the 

total  resource usage rate of my cluster is not very high.

 So , I decide to turn on preemption and modify the fair-scheduler.xml like below:

<allocations> <queue name="highPriority"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>300000 mb, 100 vcores</maxResources> <weight>0.35</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>18</maxRunningApps> </queue> <queue name="default"> <minResources>50000 mb, 20 vcores</minResources> <maxResources>140000 mb, 70 vcores</maxResources> <weight>0.14</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>20</maxRunningApps> </queue> <queue name="ep"> <minResources>100000 mb, 30 vcores</minResources> <maxResources>600000 mb, 100 vcores</maxResources> <weight>0.42</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>20</maxRunningApps> </queue> <queue name="vip"> <minResources>6000 mb, 20 vcores</minResources> <maxResources>120000 mb, 30 vcores</maxResources> <weight>0.09</weight> <minSharePreemptionTimeout>20</minSharePreemptionTimeout> <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout> <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold> <maxAMShare>0.3f</maxAMShare> <maxRunningApps>10</maxRunningApps> </queue> </allocations>

Yes , after preemption is turned on , the total resource usage rate of my cluster is up to 90%+ , but  after one night(midnight is the busiest time for my yarn cluster) , I find that many 
applications delays. 

After a long time of trouble-shooting, I find that in my 9 machine cluster, 5 has physical memory of 128G, and the left 4 machine has pythical memory 64G, 
but all their yarn-site.xml , the  yarn.nodemanager.resource.memory-mb is configured as 97280 ,that is to say , the  yarn.nodemanager.resource.memory-mb  configuration in 4 machines is actually more that the actual pythical memory . So ,I doubt if this is what result in the 
phenomenon that even though the total cluster resource usage is improves, but each application takes more time to execute and delayed seriously.


Any suggestions?