rabbitmq-server crashes and restarts repeatedly in the environment of ESXi v8.0.2
search cancel

rabbitmq-server crashes and restarts repeatedly in the environment of ESXi v8.0.2

book

Article ID: 384375

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Rabbitmq-server job on BOSH-deployed RabbitMQ service instance VM is reported being restarted repeatedly. 

In rabbit@<NODE>.rabbitmq-server.services.service-instance-<GUID>.bosh.log, there are no any errors before the restart. According to monit log, monit daemon detected rabbitmq-server process is not running frequently. 

$ grep "process is not running" monit.log
...
[UTC Nov 19 11:47:06] error    : 'rabbitmq-server' process is not running
[UTC Nov 19 11:52:39] error    : 'rabbitmq-server' process is not running
[UTC Nov 19 12:03:03] error    : 'rabbitmq-server' process is not running
[UTC Nov 20 04:11:24] error    : 'rabbitmq-server' process is not running
[UTC Nov 23 11:22:57] error    : 'rabbitmq-server' process is not running
[UTC Nov 23 21:11:17] error    : 'rabbitmq-server' process is not running
[UTC NOv 23 27:28:40] error    : 'rabbitmq-server' process is not running

In the meanwhile of each restart,  startup_stderr.log records a CPU segmentation fault. 

[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
[os_mon] memory supervisor port (memsup): Erlang has closed
Segmentation fault

 

Environment

  • Tanzu RabbitMQ on Cloud Foundry
  • ESXi v8.0.2

Resolution

There is an issue with ESXi v8.0.2 and it is fixed in v8.0.3, please refer the github issue. It can be triggered when Erlang process invokes AVX-512 CPU instructions. To resolve the issue, please upgrade ESXi from v8.0.2 to higher versions.