We started are using CGroups with LinuxContainerExecutor recently, running Apache Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn container will fail with a message like the following:
WARN privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 35. Privileged Execution Operation Stderr:
Could not create container dirsCould not create local files and directories
Looking at the container executor source it's traceable to errors here: https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604
And ultimately to https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672
The root failure seems to be in the underlying mkdir call, but that exit code / errno is swallowed so we don't have more details. We tend to see this when many containers start at the same time for the same application on a host, and suspect it may be related to some race conditions around those shared directories between containers for the same application.
Has anyone seen similar failures in using the LinuxContainerExecutor?
This issue compounded because LinuxContainerExecutor renders the node unhealthy in these scenarios: https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566
Under some circumstances this seems appropriate, but since this is a transient failure (none of these machines were at capacity for disks, inodes, etc) we shouldn't down the NodeManager. The behavior to add this blacklisting came as part of https://issues.apache.org/jira/browse/YARN-6302 which seems perfectly valid, but perhaps we should make this configurable so certain users can opt out?
Have you opened up a YARN JIRA with your findings? If not, that would be the next step in debugging the issue and coding up a fix. This certainly sounds like a bug and something that we should get to the bottom of.
As far as Nodemanagers becoming unhealthy, a config could be added to prevent this. But, if you're only seeing 1 failure out of millions of tasks, this seems like it would unmask more problems than it fixes. 1 container failing is bad, but a node going bad and failing every container that runs on it forever until it is shutdown is much, much worse. However, if you think that you have a use case that could benefit from the config being optional, that is something we could also look into. That would be a separate YARN JIRA as well.
On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender <[hidden email]> wrote:
I would also just suggest moving up to 3.1.1 and trying again. Barring that, maybe you can take the error message at its word. My experience with running Hadoop 3.x jobs is a little limited, but I know that jobs can paint a lot of data into /tmp/hadoop-yarn and if your nodes can't absorb a lot of expansion in that directory, things will error out albeit softly. Noting the way the terasort example behaves in that regard, I set up my worker nodes to make /tmp/hadoop-yarn a mount point for its own disk volume whose size I can preset and I can also optionally enable transparent compression via btrfs. A lot of times, I would expect I could give that volume some token small size but in trying to make a 1/5-scale (i.e., 200GB) terasort run, 128GiB with compression enabled across five workers wasn't enough. 1/10th-scale I could manage but at 1/5, it would fill up one node's /tmp/hadoop-yarn, then the next, then the next, etc. Makes me think that terasort tries to write the whole dang thing out to extra-HDFS file system before making an output file in HDFS.
On 9/17/18 1:55 PM, Eric Badger wrote:
YARN-8751 takes care of the issue that marks the NM unhealthy under these conditions. If you can open a JIRA with details on the swallowed error, that would be appreciated. As noted, 3.1.1 has a number of fixes to the YARN containerization features, so it would be great if you can see if the issue still occurs with that release.
On Mon, Sep 17, 2018 at 1:05 PM Jeff Hubbs <[hidden email]> wrote:
Thanks for the responses all!
@Shane - that's great, we planned to move to 3.1.x soon anyway, all the more reason to do that.
@Eric - I opened a JIRA here with my findings: https://issues.apache.org/jira/browse/YARN-8786
On Mon, Sep 17, 2018 at 12:23 PM, Shane Kumpf <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|