How can I find out which nodemanagers are unhealthy and which nodemangers are lost?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How can I find out which nodemanagers are unhealthy and which nodemangers are lost?

Huang Meilong

Hi,


I'm building a system to monitor my hadoop cluster, I can get metrics about the cluster via hadoop metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics):


ClusterMetrics

ClusterMetrics shows the metrics of the YARN cluster from the ResourceManager’s perspective. Each metrics record contains Hostname tag as additional information along with metrics.

Name Description
NumActiveNMs Current number of active NodeManagers
NumDecommissionedNMs Current number of decommissioned NodeManagers
NumLostNMs Current number of lost NodeManagers for not sending heartbeats
NumUnhealthyNMs Current number of unhealthy NodeManagers
NumRebootedNMs Current number of rebooted NodeManagers


How can I find out which nodemangers are unhealthy and which are lost? Better if  it could be achieved by calling jmx rest api or hadoop command. 


Any suggestions are appreciated, thank you.



HUANG




Reply | Threaded
Open this post in threaded view
|

Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?

Harsh J-3
The JMX servlet query for 'RMNMInfo' done via
/jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo returns a
LiveNodeManagers bean whose value is a JSON-parseable string of all
currently-tracked NodeManagers and their actual states (UNHEALTHY,
RUNNING, etc.).

You can also use the 'yarn node -list' command to retrieve similar
information from a CLI.
On Mon, Oct 15, 2018 at 8:48 AM Huang Meilong <[hidden email]> wrote:

>
> Hi,
>
>
> I'm building a system to monitor my hadoop cluster, I can get metrics about the cluster via hadoop metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics):
>
>
> ClusterMetrics
>
> ClusterMetrics shows the metrics of the YARN cluster from the ResourceManager’s perspective. Each metrics record contains Hostname tag as additional information along with metrics.
>
> Name Description
> NumActiveNMs Current number of active NodeManagers
> NumDecommissionedNMs Current number of decommissioned NodeManagers
> NumLostNMs Current number of lost NodeManagers for not sending heartbeats
> NumUnhealthyNMs Current number of unhealthy NodeManagers
> NumRebootedNMs Current number of rebooted NodeManagers
>
>
> How can I find out which nodemangers are unhealthy and which are lost? Better if  it could be achieved by calling jmx rest api or hadoop command.
>
>
> Any suggestions are appreciated, thank you.
>
>
>
> HUANG
>
>
>
>


--
Harsh J

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

答复: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?

Huang Meilong

Thank you Harsh,


What are the possible values for the state in LiveNodeManagers bean? Will LOST, ACTIV, REBOOTED and DECOMMISSIONED show up in the state filed?


发件人: Harsh J <[hidden email]>
发送时间: 2018年10月15日 12:46:49
收件人: [hidden email]
抄送: <[hidden email]>
主题: Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?
 
The JMX servlet query for 'RMNMInfo' done via
/jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo returns a
LiveNodeManagers bean whose value is a JSON-parseable string of all
currently-tracked NodeManagers and their actual states (UNHEALTHY,
RUNNING, etc.).

You can also use the 'yarn node -list' command to retrieve similar
information from a CLI.
On Mon, Oct 15, 2018 at 8:48 AM Huang Meilong <[hidden email]> wrote:
>
> Hi,
>
>
> I'm building a system to monitor my hadoop cluster, I can get metrics about the cluster via hadoop metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics):
>
>
> ClusterMetrics
>
> ClusterMetrics shows the metrics of the YARN cluster from the ResourceManager’s perspective. Each metrics record contains Hostname tag as additional information along with metrics.
>
> Name Description
> NumActiveNMs Current number of active NodeManagers
> NumDecommissionedNMs Current number of decommissioned NodeManagers
> NumLostNMs Current number of lost NodeManagers for not sending heartbeats
> NumUnhealthyNMs Current number of unhealthy NodeManagers
> NumRebootedNMs Current number of rebooted NodeManagers
>
>
> How can I find out which nodemangers are unhealthy and which are lost? Better if  it could be achieved by calling jmx rest api or hadoop command.
>
>
> Any suggestions are appreciated, thank you.
>
>
>
> HUANG
>
>
>
>


--
Harsh J
Reply | Threaded
Open this post in threaded view
|

Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?

Harsh J-3
I don't think it includes entirely inactive nodes. Use the CLI or use
the RM REST API directly:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Nodes_API
On Mon, Oct 15, 2018 at 12:20 PM Huang Meilong <[hidden email]> wrote:

>
> Thank you Harsh,
>
>
> What are the possible values for the state in LiveNodeManagers bean? Will LOST, ACTIV, REBOOTED and DECOMMISSIONED show up in the state filed?
>
> ________________________________
> 发件人: Harsh J <[hidden email]>
> 发送时间: 2018年10月15日 12:46:49
> 收件人: [hidden email]
> 抄送: <[hidden email]>
> 主题: Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?
>
> The JMX servlet query for 'RMNMInfo' done via
> /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo returns a
> LiveNodeManagers bean whose value is a JSON-parseable string of all
> currently-tracked NodeManagers and their actual states (UNHEALTHY,
> RUNNING, etc.).
>
> You can also use the 'yarn node -list' command to retrieve similar
> information from a CLI.
> On Mon, Oct 15, 2018 at 8:48 AM Huang Meilong <[hidden email]> wrote:
> >
> > Hi,
> >
> >
> > I'm building a system to monitor my hadoop cluster, I can get metrics about the cluster via hadoop metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics):
> >
> >
> > ClusterMetrics
> >
> > ClusterMetrics shows the metrics of the YARN cluster from the ResourceManager’s perspective. Each metrics record contains Hostname tag as additional information along with metrics.
> >
> > Name Description
> > NumActiveNMs Current number of active NodeManagers
> > NumDecommissionedNMs Current number of decommissioned NodeManagers
> > NumLostNMs Current number of lost NodeManagers for not sending heartbeats
> > NumUnhealthyNMs Current number of unhealthy NodeManagers
> > NumRebootedNMs Current number of rebooted NodeManagers
> >
> >
> > How can I find out which nodemangers are unhealthy and which are lost? Better if  it could be achieved by calling jmx rest api or hadoop command.
> >
> >
> > Any suggestions are appreciated, thank you.
> >
> >
> >
> > HUANG
> >
> >
> >
> >
>
>
> --
> Harsh J



--
Harsh J

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

答复: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?

Huang Meilong

Thank you so much, I will try that!


发件人: Harsh J <[hidden email]>
发送时间: 2018年10月16日 17:27:07
收件人: [hidden email]
抄送: <[hidden email]>
主题: Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?
 
I don't think it includes entirely inactive nodes. Use the CLI or use
the RM REST API directly:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Nodes_API
On Mon, Oct 15, 2018 at 12:20 PM Huang Meilong <[hidden email]> wrote:
>
> Thank you Harsh,
>
>
> What are the possible values for the state in LiveNodeManagers bean? Will LOST, ACTIV, REBOOTED and DECOMMISSIONED show up in the state filed?
>
> ________________________________
> 发件人: Harsh J <[hidden email]>
> 发送时间: 2018年10月15日 12:46:49
> 收件人: [hidden email]
> 抄送: <[hidden email]>
> 主题: Re: How can I find out which nodemanagers are unhealthy and which nodemangers are lost?
>
> The JMX servlet query for 'RMNMInfo' done via
> /jmx?qry=Hadoop:service=ResourceManager,name=RMNMInfo returns a
> LiveNodeManagers bean whose value is a JSON-parseable string of all
> currently-tracked NodeManagers and their actual states (UNHEALTHY,
> RUNNING, etc.).
>
> You can also use the 'yarn node -list' command to retrieve similar
> information from a CLI.
> On Mon, Oct 15, 2018 at 8:48 AM Huang Meilong <[hidden email]> wrote:
> >
> > Hi,
> >
> >
> > I'm building a system to monitor my hadoop cluster, I can get metrics about the cluster via hadoop metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics):
> >
> >
> > ClusterMetrics
> >
> > ClusterMetrics shows the metrics of the YARN cluster from the ResourceManager’s perspective. Each metrics record contains Hostname tag as additional information along with metrics.
> >
> > Name Description
> > NumActiveNMs Current number of active NodeManagers
> > NumDecommissionedNMs Current number of decommissioned NodeManagers
> > NumLostNMs Current number of lost NodeManagers for not sending heartbeats
> > NumUnhealthyNMs Current number of unhealthy NodeManagers
> > NumRebootedNMs Current number of rebooted NodeManagers
> >
> >
> > How can I find out which nodemangers are unhealthy and which are lost? Better if  it could be achieved by calling jmx rest api or hadoop command.
> >
> >
> > Any suggestions are appreciated, thank you.
> >
> >
> >
> > HUANG
> >
> >
> >
> >
>
>
> --
> Harsh J



--
Harsh J