EDR: 'Unable to connect to cb-rabbitmq on the head node' Error When Starting Cluster
search cancel

EDR: 'Unable to connect to cb-rabbitmq on the head node' Error When Starting Cluster

book

Article ID: 287373

calendar_today

Updated On:

Products

Carbon Black EDR (formerly Cb Response)

Issue/Introduction

  • Cluster start fails on the RabbitMQ service on Minion Node.
  • The following is seen in the /var/log/messages file:
Dec 11 14:21:11 edr-server cb-enterprise: Starting cb-redis:                                         [^[[32m  OK  ^[[0m]
Dec 11 14:21:13 edr-server cb-enterprise: Starting cb-rabbitmq:                                      [^[[32m  OK  ^[[0m]
Dec 11 14:22:25 edr-server cb-enterprise: Unable to connect to cb-rabbitmq on the head node:
Dec 11 14:22:25 edr-server cb-enterprise: b"Clustering node rabbit@CB-SERVER-CLUSTER-MINION-NODE-5 with rabbit@CB-SERVER-CLUSTER-HEAD-NODE\nError:\n{:inconsistent_cluster, 'Node \\'rabbit@CB-SERVER-CLUSTER-HEAD-NODE\\' thinks it\\'s clustered with node \\'rabbit@CB-SERVER-CLUSTER-MINION-NODE-5\\', but \\'rabbit@CB-SERVER-CLUSTER-MINION-NODE-5\\' disagrees'}\n"
Dec 11 14:22:25 edr-server cb-enterprise: Unable to connect to cb-rabbitmq on the head node:
Dec 11 14:22:25 edr-server cb-enterprise: b"Clustering node rabbit@CB-SERVER-CLUSTER-MINION-NODE-5 with rabbit@CB-SERVER-CLUSTER-HEAD-NODE\nError:\n{:inconsistent_cluster, 'Node \\'rabbit@CB-SERVER-CLUSTER-HEAD-NODE\\' thinks it\\'s clustered with node \\'rabbit@CB-SERVER-CLUSTER-MINION-NODE-5\\', but \\'rabbit@CB-SERVER-CLUSTER-MINION-NODE-5\\' disagrees'}\n"
Dec 11 14:22:25 edr-server systemd: Started SYSV: VMware Carbon Black EDR is a surveillance camera for your computer -- always recording so you know precisely what happened and where. This component provides an internal interface to the primary datastore..
  • Shown in /var/log/cb/rabbitmq/startup_err log:
rabbit@CB-SERVER-CLUSTER-MINION-NODE-5:
  * connected to epmd (port 4369) on CB-SERVER-CLUSTER-MINION-NODE-5
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on CB-SERVER-CLUSTER-MINION-NODE-5
  * suggestion: start the node

Current node details:
 * node name: 'rabbitmqcli-13696-rabbit@CB-SERVER-CLUSTER-MINION-NODE-5'
 * effective user's home directory: /var/cb
 * Erlang cookie hash: L168luP9GPZtSCT7AM4HJw==

Environment

  • EDR Server: 7.5.x

Cause

  • The most common reason for this is a networking or DNS issue with the Instance/Cluster.
  • Defect CB-36929 : In this defect the datagrid service can take over 2 minutes to complete it's startup sequence before Redis can start.  If this is a cluster and one node is failing quickly on RabbitMQ's service startup, it may be that RabbitMQ on the Minion is starting before RabbitMQ on the master can start.  This can be checked and confirmed by verifying the startup timestamps on the Master and Minion Nodes, either via /var/log/messages or /var/log/cb/supervisord/supervisord.log:
    • Minion RabbitMQ Failure (note the timestamps):
Dec 11 14:21:13 edr-server cb-enterprise: Starting cb-rabbitmq:                                      [^[[32m  OK  ^[[0m]
Dec 11 14:22:25 edr-server cb-enterprise: Unable to connect to cb-rabbitmq on the head node:
  • Master RabbitMQ has not started by the time that the minion has failed:
Dec 11 14:21:56 edr-server cb-enterprise: Starting cb-coreservices:                                  [^[[32m  OK  ^[[0m]
Dec 11 14:22:02 edr-server cb-enterprise: Starting cb-sensorservices:                                [^[[32m  OK  ^[[0m]
Dec 11 14:22:15 edr-server cb-enterprise: Starting cb-datastore:                                     [^[[32m  OK  ^[[0m]
Dec 11 14:22:17 edr-server cb-enterprise: Starting cb-liveresponse:                                  [^[[32m  OK  ^[[0m]
Dec 11 14:22:21 edr-server cb-enterprise: Starting cb-allianceclient:                                [^[[32m  OK  ^[[0m]
Dec 11 14:22:23 edr-server cb-enterprise: Starting cb-enterprised:                                   [^[[32m  OK  ^[[0m]
Dec 11 14:22:25 edr-server cb-enterprise: Starting cb-nginx:                                         [^[[32m  OK  ^[[0m]
Dec 11 14:22:30 edr-server systemd: Started SYSV: VMware Carbon Black EDR is a surveillance camera for your computer -- always recording so you know precisely what happened and where. This component provides an internal interface to the primary datastore..

 

Resolution

  • Go through this KB to confirm networking and DNS configuration on Instance: EDR: Rabbitmq failed to start
  • If that doesn't resolve the issue and the RabbitMQ service seems to fail quickly on one of the Nodes, this may be related to defect:
    • Edit the /etc/init.d/cb-enterprise script (on all Nodes) to add 'sleep 5' after the wait_for_datagrid_init() code block.  This will give the services more time to startup. 
    • As an example:
wait_for_datagrid_init() {
    sleep 5
    for i in {1..120}; do
        /usr/share/cb/cbdatagrid cluster --ready >/dev/null 2>&1
        ready_retval=$?

        if [[ $ready_retval -eq 0 ]]; then
            break
        fi