EDR: Cluster Does Not Start, Datagrid Errors

search cancel

EDR: Cluster Does Not Start, Datagrid Errors

book

Article ID: 291554

calendar_today

Updated On:

Products

Carbon Black EDR (formerly Cb Response)

Issue/Introduction

Cluster starts cb-datagrid service on master node but fails to start the cb-datagrid service(s) on minions.
Error found in /var/log/cb/datagrid/debug.log:

2020-12-13 19:48:18,807 - [WARN] - from com.hazelcast.nio.tcp.TcpIpConnection in hz._hzInstance_1_dev.IO.thread-in-2
[10.220.40.5]:5701 [dev] [3.9.4] Connection[id=57, /127.0.0.1:5701->/127.0.0.1:40394, endpoint=[127.0.0.1]:40394, alive=false, type=PYTHON_CLIENT] closed. Reason: Exception in Connection[id=57, /127.0.0.1:5701->/127.0.0.1:40394, endpoint=[127.0.0.1]:40394, alive=true, type=PYTHON_CLIENT], thread=hz._hzInstance_1_dev.IO.thread-in-2
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at com.hazelcast.internal.networking.AbstractChannel.read(AbstractChannel.java:94)
        at com.hazelcast.internal.networking.nio.NioChannelReader.handle(NioChannelReader.java:127)
        at com.hazelcast.internal.networking.nio.NioThread.handleSelectionKey(NioThread.java:401)
        at com.hazelcast.internal.networking.nio.NioThread.handleSelectionKeys(NioThread.java:386)
        at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:293)
        at com.hazelcast.internal.networking.nio.NioThread.run(NioThread.java:248)

Environment

EDR: All Supported Versions
RHEL/CentOS: 7.x +

Cause

SystemD unit/service has gotten into a bad state and needs to be corrected on each minion.

Resolution

Go to each minion and check the status of the cb-enterprise service using systemctl:

/bin/systemctl status cb-enterprise.service

If all services are stopped on the system (this can be checked using the service command, see below) but return Action of anything but Inactive: (dead), perform the steps below:
- If anything other than Inactive: (dead) is observed. Run the command below to see if the status will change to Inactive: (dead):

/bin/systemctl stop cb-enterprise.service

The SystemD stop command will not stop any running services if in a bad state, instead we should use the commands below to target those running services:

service cb-enterprise status
service <cb-service-name> stop

A system restart on each of the minions would also correct this issue.

Feedback

thumb_up Yes

thumb_down No