EDR: Cluster Does Not Start, Datagrid Errors
search cancel

EDR: Cluster Does Not Start, Datagrid Errors

book

Article ID: 291554

calendar_today

Updated On:

Products

Carbon Black EDR (formerly Cb Response)

Issue/Introduction

  • Cluster starts cb-datagrid service on master node but fails to start the cb-datagrid service(s) on minions.
  • Error found in /var/log/cb/datagrid/debug.log:
2020-12-13 19:48:18,807 - [WARN] - from com.hazelcast.nio.tcp.TcpIpConnection in hz._hzInstance_1_dev.IO.thread-in-2
[10.220.40.5]:5701 [dev] [3.9.4] Connection[id=57, /127.0.0.1:5701->/127.0.0.1:40394, endpoint=[127.0.0.1]:40394, alive=false, type=PYTHON_CLIENT] closed. Reason: Exception in Connection[id=57, /127.0.0.1:5701->/127.0.0.1:40394, endpoint=[127.0.0.1]:40394, alive=true, type=PYTHON_CLIENT], thread=hz._hzInstance_1_dev.IO.thread-in-2
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at com.hazelcast.internal.networking.AbstractChannel.read(AbstractChannel.java:94)
        at com.hazelcast.internal.networking.nio.NioChannelReader.handle(NioChannelReader.java:127)
        at com.hazelcast.internal.networking.nio.NioThread.handleSelectionKey(NioThread.java:401)
        at com.hazelcast.internal.networking.nio.NioThread.handleSelectionKeys(NioThread.java:386)
        at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:293)
        at com.hazelcast.internal.networking.nio.NioThread.run(NioThread.java:248)

 

Environment

  • EDR: All Supported Versions
  • RHEL/CentOS: 7.x +

Cause

SystemD unit/service has gotten into a bad state and needs to be corrected on each minion.

Resolution

  • Go to each minion and check the status of the cb-enterprise service using systemctl:​​​​
/bin/systemctl status cb-enterprise.service
  • If all services are stopped on the system (this can be checked using the service command, see below) but return Action of anything but Inactive: (dead), perform the steps below:
    • If anything other than Inactive: (dead) is observed. Run the command below to see if the status will change to Inactive: (dead):
/bin/systemctl stop cb-enterprise.service
  • ​The SystemD stop command will not stop any running services if in a bad state, instead we should use the commands below to target those running services:
service cb-enterprise status
service <cb-service-name> stop
  • A system restart on each of the minions would also correct this issue.