This article helps you recover YARN resource manager status from standby to active.
Symptom
In some scenarios, both the resource managers in a YARN HA enabled cluster will be in a standby state, and fail to become active. Resource manager cannot transition and stabilize to active state due to the zookeeper corruption.
Resource Manager logs show the following error:
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(599)) - Failed to load/recover state
Resolution
Clear the Resource Manager state in zookeeper with the below steps:
1. As user 'yarn
', run the following command:
yarn resourcemanager -format-state-store