We had an issue with a RESTMon agent last night, it stopped receiving and sending alarms and was not working until restarting the deployment (this is a GKE containerized version of the agent). All other agents in the same environment worked fine, this is the only one that had issues. I see the following in the logs when the issue seems to have started:
[pool-2-thread-1] INFO SendToTAS:796 - HTTP request https://apmgw.dxi-na1.saas.broadcom.com:443/tas/graph/store response status: HTTP/1.1 503 Service Unavailable
[pool-2-thread-1] ERROR SendToTAS:800 - Failed : HTTP error code : HTTP/1.1 503 Service Unavailable
And shortly after that, every 30 seconds the following errors:
[restmon-19] ERROR ReadinessChecker:74 - Oi Endpoint API is out of service
[restmon-19] ERROR CustomHealthChecker:186 - The application's readiness state is disabled with event OI_API_CHECK
This went on for about 13 hours, and after restart this morning everything was back to normal (without any changes on OI end). Please investigate, again all other RESTMon instances in the same environment didn't have an issue, only this one.
Release : SAAS