The NSX Manager has a timeout of 30 seconds to get the update from MUX process on the ESXi host, when a configuration update has been applied.
The configuration update is applied to the namespace database, each VM has these since NSX 6.4.1, these updates are taking a large amount of time around 45 seconds or more to update all the VMs on that host.
Since this time is more than 30 seconds, the NSX Manager will reschedule the same update.
The ESXi host MUX process has no way of knowing that this is the same update, therefore it will push the update to all the VMs once again.
This means there will be another time out, lasting more than 30 seconds (NSX manager time limit) and it then goes into a loop.
This issue is dependent on the load and activity on the hosts, which means it is not consistent.
When this loop occurs, the MUX process is always busy and therefore unable to process events from VMs and that is why we are seeing the random disconnects.