What are HDFS NFS “access times”?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

What are HDFS NFS “access times”?

Reed Villanueva

Having a problem with HDFS NFS, addressed on another site where it is recommended to set hdfs-site.xml like...

<property>  
<name>dfs.namenode.accesstime.precision</name>  
<value>3600000</value>  
<description>
The access time for HDFS file is precise upto this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS.  
</description> 
</property>

Am confused about what exactly "access times for HDFS" means / is. Looking at the hadoop docs, was still not able to determine. Could someone give better understanding as to what this is doing? Also, where is the nfs3 daemon log file?


This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: What are HDFS NFS “access times”?

Matt Foley-2
Hi Reed,
I think access time refers to the POSIX atime attribute for files, the “time of last access” as described here for instance [1].  While HDFS keeps a correct modification time (mtime), which is important, easy and cheap, it only keeps a very low-resolution sense of last access time, which is less important, and expensive to monitor and record, as described here [2] and here [3].  It doesn’t even expose this low-rez atime value in the `hadoop fs -stat` command; you need to use Java if you want to read it from HDFS apis.

However, to have a conforming NFS api, you must present atime, and so the HDFS NFS implementation does.  But first you have to configure it on.  The documentation says that the default value is 3,600,000 milliseconds (1 hour), but many sites have been advised to turn it off entirely by setting it to zero, to improve HDFS overall performance.  See for example here ( [4], section "Don’t let Reads become Writes).  So if your site has turned off atime in HDFS, you will need to turn it back on to fully enable NFS.  Alternatively, you can maintain optimum efficiency by mounting NFS with the “noatime” option, as described in the document you reference.

I don’t know where the nfs3 daemon log file is, but it is almost certainly on the server node where you’ve configured the NFS service to be served from.  Log into it and check under /var/log, eg with `find /var/log -name ‘*nfs3*’ -print`

Hope this helps,
—Matt
—————————
Open Source Technologies @ Siri
`This is not a contribution`



On Aug 13, 2019, at 6:43 PM, Reed Villanueva <[hidden email]> wrote:

Having a problem with HDFS NFS, addressed on another site where it is recommended to set hdfs-site.xml like...

<property>  
<name>dfs.namenode.accesstime.precision</name>  
<value>3600000</value>  
<description>
The access time for HDFS file is precise upto this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS.  
</description> 
</property>

Am confused about what exactly "access times for HDFS" means / is. Looking at the hadoop docs, was still not able to determine. Could someone give better understanding as to what this is doing? Also, where is the nfs3 daemon log file?