Namenode last checkpoint alert

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Namenode last checkpoint alert

Lian Jiang
Hi,

The primary namenode of my HA cluster using HDP2.6 goes into safemode daily due to "Namenode last checkpoint". The default dfs.namenode.checkpoint.period is 6 hours while https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml says the default is 1 hour. Manually updating this property to 1 hour does not make the alert go away. Any idea is highly appreciated. Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Namenode last checkpoint alert

Zian Chen
Hi Lian,

"Namenode last checkpoint” basically means one of these two scenario has been triggered.
1. the last time that the NameNode performed a checkpoint was too long ago
2. the number of uncommitted transactions is beyond a certain threshold.

You can try these to see if it solves your problem,
1. Set NameNode checkpoint.
2. Review threshold for uncommitted transactions.

Thanks

On Aug 5, 2018, at 7:10 PM, Lian Jiang <[hidden email]> wrote:

Hi,

The primary namenode of my HA cluster using HDP2.6 goes into safemode daily due to "Namenode last checkpoint". The default dfs.namenode.checkpoint.period is 6 hours while https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml says the default is 1 hour. Manually updating this property to 1 hour does not make the alert go away. Any idea is highly appreciated. Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Namenode last checkpoint alert

Lian Jiang
Thanks Zian. I can manually set the checkpoint to get rid of the alert. But any idea why Namenode does not set checkpoint automatically hourly specified by dfs.namenode.checkpoint.period? Appreciate your help!

On Mon, Aug 6, 2018 at 11:10 AM, Zian Chen <[hidden email]> wrote:
Hi Lian,

"Namenode last checkpoint” basically means one of these two scenario has been triggered.
1. the last time that the NameNode performed a checkpoint was too long ago
2. the number of uncommitted transactions is beyond a certain threshold.

You can try these to see if it solves your problem,
1. Set NameNode checkpoint.
2. Review threshold for uncommitted transactions.

Thanks

On Aug 5, 2018, at 7:10 PM, Lian Jiang <[hidden email]> wrote:

Hi,

The primary namenode of my HA cluster using HDP2.6 goes into safemode daily due to "Namenode last checkpoint". The default dfs.namenode.checkpoint.period is 6 hours while https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml says the default is 1 hour. Manually updating this property to 1 hour does not make the alert go away. Any idea is highly appreciated. Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: Namenode last checkpoint alert

Zian Chen
Hi Lian,

What is the property value you configured for dfs.namenode.checkpoint.period? friendly reminder, the value is in second, not millisecond:)

Thanks
Zian

On Aug 6, 2018, at 8:43 PM, Lian Jiang <[hidden email]> wrote:

Thanks Zian. I can manually set the checkpoint to get rid of the alert. But any idea why Namenode does not set checkpoint automatically hourly specified by dfs.namenode.checkpoint.period? Appreciate your help!

On Mon, Aug 6, 2018 at 11:10 AM, Zian Chen <[hidden email]> wrote:
Hi Lian,

"Namenode last checkpoint” basically means one of these two scenario has been triggered.
1. the last time that the NameNode performed a checkpoint was too long ago
2. the number of uncommitted transactions is beyond a certain threshold.

You can try these to see if it solves your problem,
1. Set NameNode checkpoint.
2. Review threshold for uncommitted transactions.

Thanks

On Aug 5, 2018, at 7:10 PM, Lian Jiang <[hidden email]> wrote:

Hi,

The primary namenode of my HA cluster using HDP2.6 goes into safemode daily due to "Namenode last checkpoint". The default dfs.namenode.checkpoint.period is 6 hours while https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml says the default is 1 hour. Manually updating this property to 1 hour does not make the alert go away. Any idea is highly appreciated. Thanks.