We observed that 12 agents previously connecting to Collector A were being denied by MOM since the recent cluster restart. There were no changes made prior to the issue apart from the restart itself.
These agents are showing up as Denied Agents in the APM Status Console. In the MOM's IntroscopeEnterpriseManager.log, these agents were shown as connected in disallowed mode:
[INFO] [IntroscopeAgent.ConnectionThread] Connected to <EM_IP>:5001,com.wily.isengard.postofficehub.link.net.DefaultSocketFactory in disallowed mode.
During our troubleshooting process, we observed in the APM Status Console that the historical metrics threshold for Collector A had been reached. We also noticed that in the MOM's loadbalancing.xml file, these agents were configured to only be load-balanced to Collector A.
Typically, when the Enterprise Manager's historical metric clamp is hit, the agents which are already connected to the Enterprise Manager (EM) would stay connected, and the EM will continue to accept data for existing metrics. The EM will only stop registering/accepting new metrics from the existing agents, and any connection request from a new agent.
In this case, the historical metric limit has been breached prior to the cluster restart, and the existing agents were still connecting to it. However, upon the restart, where the agents had disconnected, and were trying to re-connect to the Collectors through MOM's load balancing, the MOM determined that Collector A has hit the historical metric clamp, and there is no other Collector that the agents are allowed to connect to, hence denying the agent connection requests.
Increasing the historical metric threshold for Collector A from 1.2million to 5 million released the clamps and allowed the collectors to accept the agent connections again.
1. Go to the apm-events-thresholds-config.xml file on Collector A
2. Increase the value of the following properties to 5000000:
3. Observed that the clamp is released, and agents are getting re-connected to the collector.