/var/log/syslog when Manager nodes tries to fetch layout:YYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 3487 - - Tried to get layout from <nsx_manager_1_ip>:9000 but failed by timeoutYYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 6236 - - Tried to get layout from <nsx_manager_2_ip>:9000 but failed by timeoutYYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 3487 - - Tried to get layout from <nsx_manager_3_ip>:9000 but failed by timeout
layoutHelper reports the following lines in /var/log/syslog:YYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 3487 - - layoutHelper: System seems unavailableYYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 6236 - - layoutHelper: System seems unavailableYYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 3826 - - layoutHelper: System seems unavailableYYYY-MM-DDTHH:MM:SS.SSSZ <nsx_manager_fqdn> NSX 3826 - - message repeated 3 times: [layoutHelper: System seems unavailable]
YYYY-MM-DDTHH:MM:SS.SSSZ | ERROR | failAfter-0 | o.c.i.LocalMonitoringService | Error requesting sequencer metrics:java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$OrApply.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.corfudb.util.CFUtils.lambda$failAfter$0(CFUtils.java:118) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)Caused by: java.util.concurrent.TimeoutException: null at org.corfudb.util.CFUtils.<clinit>(CFUtils.java:36) at org.corfudb.runtime.clients.NettyClientRouter.sendRequestAndGetCompletable(NettyClientRouter.java:498) at org.corfudb.runtime.clients.AbstractClient.sendRequestWithFuture(AbstractClient.java:43) at org.corfudb.runtime.clients.LayoutClient.getLayout(LayoutClient.java:38) at org.corfudb.runtime.CorfuRuntime.lambda$fetchLayout$6(CorfuRuntime.java:1295) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ... 3 common frames omitted
YYYY-MM-DDTHH:MM:SS.SSSZ INFO HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor 3826 - [nsx@6876 comp="nsx-manager" level="INFO" s2comp="service-monitor" subcomp="cbm"] New entity status: [Epoch: 20]SEARCH:DOWN,PROTON:DOWN,HTTP:DOWN,CM_INV:DOWN,IDPS_REPORTING:DOWN,SM:DOWN,MESSAGING_MANAGER:DOWN,CONTROLLER:DOWN,AR:DOWN,CLUSTER_MANAGER:DOWN,MONITORING:DOWN
top - HH:MM:SS up 7 days, 6:01, 0 users, load average: 0.95, 1.17, 1.15top - HH:MM:SS up 7 days, 6:02, 0 users, load average: 5.08, 2.00, 1.42top - HH:MM:SS up 7 days, 6:03, 0 users, load average: 55.98, 16.68, 6.48top - HH:MM:SS up 7 days, 6:04, 0 users, load average: 141.78, 49.64, 18.61top - HH:MM:SS up 7 days, 6:05, 0 users, load average: 200.18, 80.05, 30.60top - HH:MM:SS up 7 days, 6:06, 0 users, load average: 265.76, 121.19, 47.86
VMware NSX
VMware vSphere ESXi
A load average of more than 100 is critically high for a Guest Operating System. Applications cannot function reliably under such extreme contention. Negative available CPU capacity on the ESXi host indicates that the NSX Manager was completely starved of CPU cycles, leading to the services going down.
The Virtual Machine CPU usage alarm triggered on vSphere Client for NSX Manager Virtual Machine is a trailing symptom of the host's resource exhaustion.
NSX Manager services goes down because the ESXi host in unable to fulfill the resource demands of NSX Manager Virtual Machine.
At the time of issue, please collect the following information and open a Broadcom Support Case and select the product VMware vSphere ESXi:
esxtop -b -a -d 2 -n 150 | gzip -9c > /vmfs/volumes/datastore_name/esxtop-Hostname.csv.gz
To bring the NSX Manager Virtual Machine back to functional state, vMotion it to another ESXi host which has abundant resources and reboot the VM.