Suddenly, the NAC/Mgmt server has completely stopped responding.
Note:
This article describes a specific/rare scenario. The scenario described here is accompanied with none of the product or tomcat logs being updated for over several minutes. If the product seems very slow, but is still logging information than that's more of a performance problem and this article does not apply.
Also, this was observed in an environment where there were two NAC servers online. One primary and one secondary (active/passive - supported HA configuration). It is unclear if this played a role in the described behavior. Even in NAC HA setup's, log messages get generated on the secondary NAC in no fewer than every 30 seconds. There is no known explanation for the nolio_dm_all.log missing minutes worth of information other than:
This is rare and should be easy to spot on an actively impacted system - just look at the last few messages of an active log file (like the nolio_dm_all.log on NACs). If the last time it wrote to the log was a few minutes ago then this article applies.
The root cause of this problem needs to be investigated. The "Additional Information" section describes what information would be needed for root cause analysis. The section will also have links to other KB Articles that may seem similar to this.
The "Resolution" section describes steps that have been used to recover Nolio RA.
Release : 6.6
Component : CA RELEASE AUTOMATION CORE
To recover from this, please:
If this problem has occurred and root cause is needed then the following must be done before attempting to recover.
> jstack -l <pid from step1> >> nolio_server.log
Related Articles: