In vCenter extremely high number of sessions are registered, which are not closed correctly, which has adverse effects on the vCenter (it will cause the vCenter to delay notification to any 2nd party applications (e.g. NSX/vSAN etc) by longer periods).
The users creating these sessions are used to perform volumes operation towards the vCenter. They are used by the csi services in every cluster to authenticate and enable all disk/cns volume operations.
In the vCenter logs following snippets can be found to confirm the problem:
logs of envoy -
envoy-access-1818.log:2025-08-11T08:49:08.075Z info envoy[2343] [Originator@6876 sub=Default] 2025-08-11T08:34:08.060Z POST /sdk 0 downstream_remote_disconnect DC 235 0 - 899944 - - <IPADDRESS>:59890 HTTP/2 TLSv1.2 100.xxx.xxx.164:443 127.0.0.1:33432 HTTP/2 - 127.0.0.1:8085 - "WaitForUpdatesEx"
envoy-access-1818.log:2025-08-11T08:49:12.135Z info envoy[2343] [Originator@6876 sub=Default] 2025-08-11T08:34:12.528Z POST /sdk 0 downstream_remote_disconnect DC 258 0 - 899472 - - <IPADDRESS>:36398 HTTP/2 TLSv1.2 100.xxx.xxx.164:443 127.0.0.1:33402 HTTP/2 - 127.0.0.1:8085 - "WaitForUpdatesEx"
There are many instances of the above messages - and the difference between the log time and the request time in each of those is 15 minutes.
From another perspective vpxd also tries to send a result (timeout-nochanges) after 15 minutes but it gets an error because the client already stopped listening. And exactly after that a new list view gets created
2025-08-10T16:46:19.055+02:00 error vpxd[06196] [Originator@6876 sub=Vmomi opID=5b61d915] Caught exception while sending activation fault; <<52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5, <TCP '127.0.0.1 : 8085'>, <TCP '127.0.0.1 : 33416'>>, session[52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5]526bafc0-xxxx-xxxx-xxxx-6988c590d151, vmodl.query.PropertyCollector.waitForUpdatesEx, <vsan.version.version23, official, 8.0.0.4>, {stm: {<io_obj p:0x00007fa6242e6c50, h:170, <TCP '127.0.0.1 : 8085'>, <TCP '127.0.0.1 : 33402'>>, id: 151751557, state(in/out): 4/4}, session: <52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5, <TCP '127.0.0.1 : 8085'>, <TCP '127.0.0.1 : 33416'>>, req: {POST, /sdk}}>, vmodl.fault.RequestCanceled, N7Vmacore11IOExceptionE(Data already transmitted; cannot reset.)
2025-08-10T16:46:19.067+02:00 info vpxd[07187] [Originator@6876 sub=vpxLro opID=42a03f37] [VpxLRO] -- BEGIN lro-371417469 -- ViewManager -- vim.view.ViewManager.createListView -- 52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5(521a225a-xxxx-xxxx-xxxx-c40658cca2ea)
2025-08-10T17:01:19.068+02:00 error vpxd[06978] [Originator@6876 sub=Vmomi opID=51b811d1] Caught exception while sending activation fault; <<52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5, <TCP '127.0.0.1 : 8085'>, <TCP '127.0.0.1 : 33416'>>, session[52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5]52eb580f-xxxx-xxxx-xxxx-aab7ae06b97c, vmodl.query.PropertyCollector.waitForUpdatesEx, <vsan.version.version23, official, 8.0.0.4>, {stm: {<io_obj p:0x00007fa6242e6c50, h:170, <TCP '127.0.0.1 : 8085'>, <TCP '127.0.0.1 : 33402'>>, id: 151773945, state(in/out): 4/4}, session: <52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5, <TCP '127.0.0.1 : 8085'>, <TCP '127.0.0.1 : 33416'>>, req: {POST, /sdk}}>, vmodl.fault.RequestCanceled, N7Vmacore11IOExceptionE(Data already transmitted; cannot reset.)
2025-08-10T17:01:19.084+02:00 info vpxd[06969] [Originator@6876 sub=vpxLro opID=6e6edf2] [VpxLRO] -- BEGIN lro-371493864 -- ViewManager -- vim.view.ViewManager.createListView -- 52e480eb-xxxx-xxxx-xxxx-78c4bdcbf3b5(521a225a-xxxx-xxxx-xxxx-c40658cca2ea)
TKGi 1.20
TKGi 1.21
The timeout for WaitForUpdatesEx can be controlled from the client (CSI Controller) however it does not and takes the default from the vCenter which is 15 minutes. Causing the high number of open connections.
Updates in the CNS/CSI code will be applied to correct this behaviour, however the changes are expected to take place in the comming versions.
To workaround the problem change to vCenter have to be made:
<propertyCollector>
<maxWaitSecondsLimit>600</maxWaitSecondsLimit>
</propertyCollector>
(where 600 is in seconds - equals :
/etc/vmware-vpx/vpxd.cfg
The format of the file is xml, so this have to be applied wihin the <config> tag as given in the following example
<config>
<propertyCollector>
<maxWaitSecondsLimit>600</maxWaitSecondsLimit>
</propertyCollector>
...
</config>
This option is not dynamic and vpxd needs a restart afterwards in order the above change to take effect. follow below KB for details:
https://knowledge.broadcom.com/external/article/340943/stop-start-or-restart-services-on-vcente.html