ERROR HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor 84198 - [nsx@6876 comp="nsx-manager" errorCode="HBS153" level="ERROR" s2comp="service-monitor" subcomp="cbm"] One or more services are down: [Epoch: 26]SEARCH:DOWN,AR:DOWN,PROTON:DOWN,CLUSTER_MANAGER:DOWN,MONITORING:UP,CM_INV:DOWN,CONTROLLER:DOWN,IDPS_REPORTING:DOWN,MESSAGING_MANAGER:DOWN,SM:DOWN,HTTP:DOWN
INFO WrapperStartStopAppMain AbstractView 3904309 layoutHelper: Retried 44 times, SystemDownHandlerTriggerLimit = 60
INFO WrapperStartStopAppMain AbstractView 3904309 layoutHelper: Retried 45 times, SystemDownHandlerTriggerLimit = 60
INFO WrapperStartStopAppMain AbstractView 3904309 layoutHelper: Retried 46 times, SystemDownHandlerTriggerLimit = 60
...
WARN org.corfudb.runtime.collections.streaming.StreamPollingScheduler-worker-3 DataStoreDisconnectHandler 85708 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Disconnected from the database, restarting the service
INFO DistributedLockMonitorThread AbstractView 84198 layoutHelper: Retried 3 times, SystemDownHandlerTriggerLimit = 90
INFO DistributedLockMonitorThread AbstractView 84198 layoutHelper: Retried 4 times, SystemDownHandlerTriggerLimit = 90
…
INFO DistributedLockMonitorThread AbstractView 84198 layoutHelper: Retried 61 times, SystemDownHandlerTriggerLimit = 90
INFO DistributedLockMonitorThread AbstractView 84198 layoutHelper: Retried 62 times, SystemDownHandlerTriggerLimit = 90
WARN | client-1 | o.c.r.c.ClientResponseHandler | Server threw exception for SERVER_ERROR with request_id: 1714247
ERROR | Cmpt-9000-chkpter | compactor-leader | Exception in runOrchestrator():
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at org.corfudb.util.CFUtils.getUninterruptibly(CFUtils.java:71)
at org.corfudb.util.CFUtils.getUninterruptibly(CFUtils.java:105)
INFO | jvm 1037 | 2024/08/27 00:51:02 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper | 2024/08/27 00:51:02 | The JVM has run out of memory. Requesting thread dump.
STATUS | wrapper | 2024/08/27 00:51:02 | Dumping JVM state.
STATUS | wrapper | 2024/08/27 00:51:02 | The JVM has run out of memory. Restarting JVM.
VMware NSX
If internal corfu-runtime threads are unable to communicate with the corfu-server and those threads are not cleaned up properly, it can lead to the Corfu database down and eventually to Out Of Memory situations. This eventually brings all NSX Manager services (ie: Proton, CBM etc) down also as they depend on Corfu database connectivity.
This issue is resolved in VMware NSX 4.2, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Workaround:
/etc/init.d/corfu-server restart
)