YARN on non-hdfs Error

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

YARN on non-hdfs Error

Ascot Moss
Hi,

I found that yarn can be used on non-hdfs file system. like AWS s3: or gfs

I am trying yarn on localFS, I use "yarn" user to start yarn on localFS (which is non-HDFS), and have other users to submit their own jobs, I got error on  yarn.app.mapreduce.am.staging-dir permission:

/tmp/hadoop-yarn/staging/user_a/.staging/job-xxxxxxxx permission denied.
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl (Native Method)

Please help.
Regards


Reply | Threaded
Open this post in threaded view
|

Re: YARN on non-hdfs Error

Ascot Moss
 FYI,  localFS here is shared disk mounted in CentOS.

I have also tired to use another mount point which is of ext4 format, I get similar error as  /tmp/hadoop-yarn/staging/user_a/.staging/ is changed to 700 which is owned by user_a thus yarn cannot read it.

Any idea?


On Thu, Jun 28, 2018 at 5:39 PM, Ascot Moss <[hidden email]> wrote:
Hi,

I found that yarn can be used on non-hdfs file system. like AWS s3: or gfs

I am trying yarn on localFS, I use "yarn" user to start yarn on localFS (which is non-HDFS), and have other users to submit their own jobs, I got error on  yarn.app.mapreduce.am.staging-dir permission:

/tmp/hadoop-yarn/staging/user_a/.staging/job-xxxxxxxx permission denied.
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl (Native Method)

Please help.
Regards



Reply | Threaded
Open this post in threaded view
|

Re: YARN on non-hdfs Error

Ascot Moss
Hi,

Can "yarn.app.mapreduce.am.staging-dir" be set to use Linux shared mount point? if yes, is the following correct?

mapred-site.xml

<property>
   <name>yarn.app.mapreduce.am.staging-dir</name>
   <value>file:/share_mnt/tmp</value>
</property>

where "share_mnt" is the shared folder that can be accessed by all nodes.

Please help!




Reply | Threaded
Open this post in threaded view
|

Re: YARN on non-hdfs Error

Jeff Hubbs
I'm not sure what you're going for here, but the impression I get of what this parameter does tells me that if multiple machines write to it, things will explode. That would not be the case, however, if the developers coded mapreduce to make each machine name directories or files according to the specific machine (i.e., incorporating the node name or the node's MAC address).

Something you could do in the event that the contents of ${yarn.app.mapreduce.am.staging-dir} are not named in a node-specific way would be to incorporate that yourself like so:

<property>
   <name>yarn.app.mapreduce.am.staging-dir</name>
   <value>file:/share_mnt/tmp/${HOSTNAME}</value>
</property>
Then you'll wind up with each node's stuff on that share side-by-side.

I'd ask, though: are you sure you want to do this? What mechanism uses this staging directory? Is it every worker's NodeManager daemon? If it is, do you really want every node writing to the same network share during a job?



On 6/28/18 10:36 PM, Ascot Moss wrote:
Hi,

Can "yarn.app.mapreduce.am.staging-dir" be set to use Linux shared mount point? if yes, is the following correct?

mapred-site.xml

<property>
   <name>yarn.app.mapreduce.am.staging-dir</name>
   <value>file:/share_mnt/tmp</value>
</property>

where "share_mnt" is the shared folder that can be accessed by all nodes.

Please help!