Occasionally we get the following error from AIOPs server or Connection Resets while registering or pushing metric. We wanted to know the root cause of these.
DX SAAS 25.11.1
For the /nass/metricValue/store apmservice call errors, the logs captured the exception call stacks similar to the one below:
2025-12-16T04:33:01.880Z ERROR 1 --- [nass] [xxxx-xx] c.c.a.c.rest.ServiceExceptionHandler : https://xxxxx/nass/metricValue/store, 500,0,6fa5ef35d56aeb24: GENERIC_SERVICE_ERROR, java.util.concurrent.CancellationException
com.ca.apm.common.api.ServicesException: 500,0,6fa5ef35d56aeb24: GENERIC_SERVICE_ERROR, java.util.concurrent.CancellationException
Such errors may indicate the nass pod being overwhelmed and triggering cancellation of concurrent tasks. Looking up the health and performance metrics for this particular nass instance, they didn't appear to show abnormal patterns of critical performance issues around that error time period. Thus, this error hiccup could be momentarily and the nass pod appeared to recover subsequently.
As for their metadata/registerMetric apmservice call errors, the logs captured the exception call stacks similar to the one below:
2025-12-10T23:08:10.591Z ERROR 1 --- [metadata] [xxxxx] c.c.a.c.rest.ServiceExceptionHandler : https://xxxxxxx/metadata/registerMetric, 500,0,879133300c541546: GENERIC_SERVICE_ERROR, reactor.netty.channel.AbortedException: Connection has been closed
com.ca.apm.common.api.ServicesException: 500,0,879133300c541546: GENERIC_SERVICE_ERROR, reactor.netty.channel.AbortedException: Connection has been closed
This error may indicate the metadata pod taking too long to process the registerMetric calls and connections being closed
or likely timed out prior to completing the responses. Again, this error hiccup could be momentarily and the particular metadata pod appeared to recover as well.
If these errors have occurred only sporadically,you may also consider improving your web service implementation to handle these apmservice api call exceptions accordingly, e.g. limiting number of concurrent calls/submissions and/or retrying the same calls/submissions later upon having such errors, etc.