Applied the KB Aria Operations stops collecting data due to large number of distinct NSX BGP Neighbor Instanced Metric metric keys, however, the "No data receiving" state recurred within 1-2 days.
The following logs were observed in /storage/log/vcops/log/collector.log:
YYYY-MM-DDTHH:MM:SS WARN [DataForwarder] com.vmware.vcops.platform.common.DataForwarder.sendData - Failed to send forward data through channel : IRawDataForwarder.NotEnoughBufferSpace: org.apache.geode.cache.CacheWriterException: FORWARD_DATA_REGION region exceed the maximum number (80000) of entries.
...
YYYY-MM-DDTHH:MM:SS FATAL [DataForwarder] com.vmware.vcops.platform.common.DataForwarder.sendData - Failed to send data: forward data queue is backing up, data is being lost!
Aria Operations 8.18.x
NSX-T Management Pack can generate a huge number of distinct BGP Neighbor Instanced Metric metric keys, these metrics lead to a few objects having very large FSDB data files, which appears to impact the performance of saving these metrics, it leads to other performance issues, like collections not completing normally.
1. Take the Aria Operations cluster offline and then bring it back online to restore data collection.
2. Delete all objects under the Router Service in Inventory Management (Object Types -> Router Service). If any objects fail to delete, place them in maintenance mode for 24 hours to ensure they are eventually removed.
Select All
Delete Object
Start Maintenance