v2.8.0: Setting PID dir EnvVars: libexec/thing-config.sh or etc/hadoop/thing-env.sh ?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

v2.8.0: Setting PID dir EnvVars: libexec/thing-config.sh or etc/hadoop/thing-env.sh ?

Kevin Buckley
I've noted in

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

that it says

----8<------------8<------------8<------------8<------------8<----
See etc/hadoop/hadoop-env.sh for other examples.

Other useful configuration parameters that you can customize include:

    HADOOP_PID_DIR - The directory where the daemons’ process id files
                     are stored.

    HADOOP_LOG_DIR - The directory where the daemons’ log files are
                     stored. Log files are automatically created if
                     they don’t exist.

    HADOOP_HEAPSIZE / YARN_HEAPSIZE - The maximum amount of heapsize
                     to use, in MB e.g. if the varibale is set to 1000
                     the heap will be set to 1000MB. This is used to
                     configure the heap size for the daemon. By
                     default, the value is 1000. If you want to
                     configure the values separately for each deamon
                     you can use.

In most cases, you should specify the HADOOP_PID_DIR and HADOOP_LOG_DIR
directories such that they can only be written to by the users that
are going to run the hadoop daemons. Otherwise there is the potential
for a symlink attack.
----8<------------8<------------8<------------8<------------8<----

and I have recently had a need to move my PID-files from their default,
/tmp, location, so as to avoid an over-aggressive /tmp cleanup.


So, if I am in

 /path/to/hadoop-2.8.0/etc/hadoop

I can see

$ grep PID *
hadoop-env.cmd:set HADOOP_PID_DIR=%HADOOP_PID_DIR%
hadoop-env.cmd:set HADOOP_SECURE_DN_PID_DIR=%HADOOP_PID_DIR%
hadoop-env.sh:export HADOOP_PID_DIR=${HADOOP_PID_DIR}
hadoop-env.sh:export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
mapred-env.sh:#export HADOOP_MAPRED_PID_DIR= # The pid files are
stored. /tmp by default.
$

so clearly PID_DIR values could be overriden there, although I note that the

 yarn-env.sh

file, unlike its "hadoop" and "mapred" bretheren, doesn't contain
any PID_DIR default lines, nor even commented suggestion lines for

 YARN_PID_DIR

which is used in, for exmaple,

 hadoop-2.8.0/sbin/yarn-daemon.sh



However, I've also noted that many of the Hadoop sbin scripts
will source a related file from

 hadoop-2.8.0/libexec/

so for example, a desired value for the env var

 YARN_PID_DIR

might thus be set in either (or both ?) of

 hadoop-2.8.0/etc/hadoop/yarn-env.sh
 hadoop-2.8.0/libexec/yarn-config.sh

and similarly,

  HADOOP_PID_DIR

might be set in either (or both ?) of

 hadoop-2.8.0/libexec/hadoop-config.sh
 hadoop-2.8.0/etc/hadoop/hadoop-env.sh


So,

is there a "to be preferred" file choice, between those two,
within which to set certain classes of EnvVars ?


Any info/pointers welcome,
Kevin

---
Kevin M. Buckley

eScience Consultant
School of Engineering and Computer Science
Victoria University of Wellington
New Zealand

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...