During vIDM boot RabbitMQ is not starting and There was an error on vIDM dashboard with "There was a problem Messaging service Error retrieving RabbitMQ status"
VMware Identity Manager 3.3.x
If you see the following message in the horizon logs and got the cause for "Messaging Connection: Messaging connection test failed"
As more and more messages piled up in RabbitMQ. It could eat up all the hard disk space for RabbitMQ. Thus RabbitMQ connection will be blocked, i.e. unhealthy. Check the log below:
root@idm [ ~ ]# rabbitmqctl stop_app
Stopping rabbit application on node rabbitmq@vm-idm ...
Error: unable to perform an operation on node 'rabbitmq@idm'. Please see diagnostics information and suggestions below.
root@idm [ ~ ]# rabbitmqctl force_reset
Error: unable to perform an operation on node 'rabbitmq@idm'. Please see diagnostics information and suggestions below.
root@idm [ ~ ]# rabbitmqctl start_app
Starting node rabbitmq@vm-idm ...
Error: unable to perform an operation on node 'rabbitmq@idm'. Please see diagnostics information and suggestions below.
1. Take snapshot and run following commands
rabbitmqctl status
rabbitmqctl list_queues | grep analytics
service horizon-workspace stop
rabbitmqctl reset (did not work) - so we took the following steps
rabbitmqctl stop_app
rabbitmqctl force_reset
rabbitmqctl start_app
service horizon-workspace start" on each node (wait for workspace to be fully up before moving on the next one so there is no danger of downtime for users)
2. check the space on /db with "df". If there is plenty after clearing out RabbitMQ and elasticsearch, then they are good (should be < 10% used), otherwise increase the size of the /db filesystem. -
vIDM appliance has no space left on device /db for audit data.
vIDM 3.3.x vPostgres DB OAuth2RefreshToken table consumes most space on the appliance leading to service outages.
3. If after this it still does not fix the messaging connection, do the following
rabbitmqctl stop_app
rabbitctl reset
rabbitmqctl start_app
rabbitmq-server -detached
4.if above commands do not resolved the issue run below command and RabbitMQ service will come back again on working state.
systemctl stop rabbitmq-server.service
rm -rf /db/rabbitmq/data
service horizon-workspace restart
systemctl start rabbitmq-server.service
5. if the above commands do not resolve the RabbitMQ issue try following steps
/etc/systemd/system/multi-user.target.wants/rabbitmq-server.service
and remove " -detached &" from the ExecStart command/path.
Save the changes
run below command and reboot,
chown -R rabbitmq:rabbitmq /db/RabbitMQ
6. when the horizon.log - could see multiple RabbitMQ scheduler messages
Running the following command showed that only 2 nodes were in the OpenSearch cluster:
curl http://localhost:9200/_cluster/state/nodes,master_node?pretty
To resolve this, performed the following on the 3 appliances:
/etc/init.d/opensearch stop
systemctl stop rabbitmq-server.service
systemctl start rabbitmq-server.service
rabbitmqctl list_queues | grep -i analytics
rabbitmq-server -detached &
/etc/init.d/opensearch start
run curl http://localhost:9200/_cluster/state/nodes,master_node?pretty
(you can see the 3 appliances listed and master node and all was ok with opensearch)
then run :
/usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks
service horizon-workspace restart