In rare cases, vCenter Server may become unresponsive to incoming requests. The vpxd service does not immediately crash, but customer's operations / API calls may hang or timeouts. This behavior occurs because all available worker threads become blocked, leaving no threads available to process new requests.
There is a known issue that can occur very rarely during normal working of Vmacore’s HTTP/2 stack, which causes a deadlock on the thread. Eventually, if enough threads become stuck, this will hinder vCenter’s capability to handle further requests.
Detection / Diagnosis:
Currently, the only reliable way to confirm this issue is by inspecting a live core dump of the affected service.
Indicators observed in the live dump include:
Note:
There are no specific vCenter log messages or alarms that conclusively identify this issue without analyzing a live core dump.
If this condition is suspected, collect a live dump of the affected service by following KB and engage Broadcom VCF Support for validation.
Engineering team is aware of this issue and a permanent fix is available in:
Upgrading to one of these versions resolves the issue. If upgrading is not possible, we can apply the below workaround.
Workaround:
Disabling HTTP/2 stream reuse from the vmacore service configuration file will completely prevent the issue from occurring for the specific service. This will slightly hinder performance, but also slightly reduce memory usage.
Steps:
python /usr/lib/vmware-vpx/py/xmlcfg.py -f /etc/vmware-vpx/vpxd.cfg set vmacore/http/http2/maxPooledStreams 0service-control --restart vpxd