Provisioning fails with failure to publish EBS events topics and Memory alarms on RabbitMQ
search cancel

Provisioning fails with failure to publish EBS events topics and Memory alarms on RabbitMQ

book

Article ID: 314719

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  •    Virtual Machine provisioning may fail with errors similar to:
Failed to publish event to topic: Deployment completed
  •     Viewing the RabbitMQ logs using command kubectl logs -f -n prelude rabbitmq-ha-0 there are memory errors present:
2023-06-12 08:19:39.915759+00:00 [warning] <0.1237.0> *** Publishers will be blocked until this alarm clears ***
2023-06-12 08:19:39.915759+00:00 [warning] <0.1237.0> **********************************************************
2023-06-12 08:19:39.915759+00:00 [warning] <0.1237.0>
2023-06-12 08:19:50.926577+00:00 [info] <0.1239.0> vm_memory_high_watermark clear. Memory used:1023873024 allowed:1024000000
2023-06-12 08:19:50.926803+00:00 [warning] <0.1237.0> memory resource limit alarm cleared on node '[email protected]<fqdn>'

2023-05-29 04:00:41.980482+00:00 [info] <0.747.0> queue 'com.vmware.automation.broker.broadcast.command-qq-909ba5e1-0597-48c5-90d9-b36efae52fa1' in vhost '/': leader call - leader not known. Command will be forwarded once leader is known.
  • The Event broker service Logs located under /services-logs/prelude/ebs-app/file-logs/ebs-app.log contain memory alarm errors:

Memory alarm on node [email protected]<fqdn>
Memory alarm on node [email protected]<fqdn>
Memory alarm on node [email protected]<fqdn>

  • The times taken for provisioning operations dependent on RabbitMQ and EBS components may degrade over time. A reboot will temporarily improve the performance.


Environment

VMware Aria Automation 8.12.x

Cause

In the case of broadcast subscriptions, event broker service creates a dedicated queue for each subscriber pod. As each subscriber pod comes with a unique identifier, over time, we start leaking queues due to subscriber pods' restarts. The event broker does not keep track of which 3 particular pods of a particular Aria Automation service are active and so it never deletes old queues.

Resolution

The issue is resolved in the Aria Automation 8.13.1 release.

Workaround:
To workaround the issue you can follow the steps in the workaround section of Knowledge base article 81146 to reset the rabbitmq cluster which will reset the queues.

Additional Information

Impact/Risks:
Over time the queue leaks cause performance issue's for the rabbitmq component which can ultimately impact the products functionality.