vsfwd.log
, on the ESXi host, you will notice that the host is connecting to rabbitMQ process on NSX Manager repeatedly due to 'failed to log in
':2019-08-11T19:12:14Z vsfwd: [INFO] Removed App of AppType VNET and AppId host-#####
2019-08-11T19:12:15Z vsfwd: [ERROR] Failed to log on to broker 10.236.18.15:5671: Logging in: Input/output error
2019-08-11T19:12:19Z vsfwd: [INFO] Ending NET CLIENT Listen Thread
2019-08-11T19:12:19Z vsfwd: [INFO] Exiting NET bulk data thread...
2019-08-11T19:12:25Z vsfwd: [INFO] Find new password, !!EXITING!!
--
2019-08-11T19:54:17Z vsfwd: [INFO] Removed App of AppType VNET and AppId host-#####
2019-08-11T19:54:19Z vsfwd: [ERROR] Failed to log on to broker 10.236.18.15:5671: Logging in: Input/output error
2019-08-11T19:54:22Z vsfwd: [INFO] Ending NET CLIENT Listen Thread
2019-08-11T19:54:22Z vsfwd: [INFO] Exiting NET bulk data thread...
2019-08-11T19:54:22Z vsfwd: [INFO] Find new password, !!EXITING!!
grep
command as shown below. This will show a list of time stamps where the host is discovering that its been issued a new password by NSX Manager. The NSX Manager provides the host with a new password when it misses certain number of heartbeat responses from host.$ grep "EXITING" vsfwd.log
2019-08-11T19:12:25Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T19:18:35Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T19:23:45Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T19:29:55Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T19:42:01Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T19:48:12Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T19:54:22Z vsfwd: [INFO] Find new password, !!EXITING!!
2019-08-11T20:00:32Z vsfwd: [INFO] Find new password, !!EXITING!!
vsm.log
on NSX Manager, the following log lines indicate that there are heartbeat messages getting missed by the NSX Manager rabbitmq process: 2019-08-11 11:41:43.302 CDT ERROR HeartbeatManagerHeartbeatTimer HeartbeatManager$HeartbeatTask:295 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] Client has not responded to the heartbeat for longer than the alert threshold. Peer name = 'com.vmware.vshield.userworld', client token = 'host-#####', client id = '33977938-####-####-####-488e6b274bd8', last heartbeat response = '306', last published heartbeat = '308'
2019-08-11 11:53:43.438 CDT ERROR HeartbeatManagerHeartbeatTimer HeartbeatManager$HeartbeatTask:295 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] Client has not responded to the heartbeat for longer than the alert threshold. Peer name = 'com.vmware.vshield.userworld', client token = 'host-#####', client id = '33977938-####-####-####-488e6b274bd8', last heartbeat response = '308', last published heartbeat = '312'
The issue occurs because every host is trying to register to NSX manager. During this process every publish event than happens for a VNIC generates a ACK message back to NSX manager. The NSX Manager may get overwhelmed by the volume of messages generated by hosts, which may lead to the NSX Manager not receiving or processing the heartbeat messages from other hosts and respond back in time, thus causing a loop.
Note: Logs for ACK received by NSX Manager from every VNIC in the infrastructure can only be seen when debug logs are enabled.
This is a known issue affecting VMware NSX for vSphere 6.4.x.
Currently, there is no resolution.
Steps to track issue (event code - 301033)
POST
https://<nsx-manager>/api/1.0/services/debug/loglevel/com.vmware.vshield.firewall.messaging.response.service.ResponseHandler?level=DEBUG
Processing VSFWEvent
” in vsm.log
and get the count in a given time interval to see if there are abnormal number of events related to status code - 30103POST https://<nsx-manager>/api/1.0/services/debug/loglevel/com.vmware.vshield.firewall.messaging.response.service.ResponseHandler?level=INFO