Controller service is going down on one or more NSX-T appliance
search cancel

Controller service is going down on one or more NSX-T appliance

book

Article ID: 324394

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • NSX Manager logs (syslog) display messages similar to:
<179>1 2019-10-10T06:09:27.117Z Manager-2 NSX 8861 - [nsx@6876 comp="heartbeatservice-server" errorCode="HBS153" level="ERROR" subcomp="ServiceMonitor"] One or more services are down
  • NSX Manager logs (cbm) display messages similar to:
2019-10-10T06:09:27.117Z INFO HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor - - [nsx@6876 comp="heartbeatservice-server" level="INFO" subcomp="ServiceMonitor"] New entity status: [Epoch: 24]SEARCH:UP,CLUSTER_MANAGER:UP,PROTON:UP,HTTP:UP,CONTROLLER:DOWN,POLICY:UP
2019-10-10T06:09:27.117Z ERROR HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor - - [nsx@6876 comp="heartbeatservice-server" errorCode="HBS153" level="ERROR" subcomp="ServiceMonitor"] One or more services are down
  • The controller logs in the NSX-T Manager (/var/log/cloudnet/) show during the same period of time, a exception related to CCP-Corfu-Monitor-Ping-0 failure:
  1. MismatchException:

2019-10-10T06:09:27.294Z WARN CCP-Corfu-Monitor-Pool KvStoreConnectionMonitor - - [nsx@6876 comp="nsx-controller" subcomp="kvstore-connection-monitor"] Exception while ping backend kv-store, java.util.concurrent.ExecutionException: com.vmware.nsx.platform.kvstore.adapter.MismatchException: Version mismatch(MonotonicLong{version=1219689} != MonotonicLong{version=1219688}), store(StoreId(namespace=ccp, name=pingStore)), key(ImmutableKey{key='<binary> hash(-1828200220)'})
 
2019-10-10T06:09:27.283Z INFO CCP-Corfu-Monitor-Notifier-0 CorfuDbConnector - - [nsx@6876 comp="nsx-controller" subcomp="corfudb-connector"] CorfuDB Server is disconnected, invoke registered callbacks.
2019-09-10T13:26:15.283Z INFO CCP-Corfu-Connector-Notifier-0 CorfuDbCommonService - - [nsx@6876 comp="nsx-controller" subcomp="corfudb-common-service"] CorfuDB is disconnected, set Cluster Status Down
  1. TransactionAbortedException:

2019-10-10T06:28:38.045Z  WARN CCP-Corfu-Monitor-Pool KvStoreConnectionMonitor - - [nsx@6876 comp="nsx-controller" level="WARN" subcomp="kvstore-connection-monitor"] Exception while ping backend kv-store
019-10-10T06:29:08.057Z  WARN CCP-Corfu-Monitor-Ping-0 CorfuDbAdapter - Transaction failed: {} org.corfudb.runtime.exceptions.TransactionAbortedException
2019-10-10T06:29:08.058Z  WARN CCP-Corfu-Monitor-Pool KvStoreConnectionMonitor - - [nsx@6876 comp="nsx-controller" level="WARN" subcomp="kvstore-connection-monitor"] Exception while ping backend kv-store, java.util.concurrent.ExecutionException:

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment
 


Environment

VMware NSX-T Data Center 2.x
VMware NSX-T Data Center

Resolution

This issue is fixed in NSX-T 2.5.1 and 3.0 versions


Workaround:
Currently, there is no workaround.

Additional Information

Impact/Risks:

There is no datapath impact since controller comes back UP immediately and the other two controllers (majority) are UP