Symptoms:
/var/log/proxy/envoy.log
there is a warning message at the end similar to the following:[warning][config] [source/common/config/filesystem_subscription_impl.cc:43] Filesystem config update rejected: Error adding/updating listener(s) https-node-v4-local: Failed to load trusted CA certificates from <inline>
This issue is resolved in VMware NSX 4.1.2.x and 4.2.0.x and higher. The workaround below would provide a temporary solution until the environment can be upgraded.
Workaround
You can recover UI access by temporarily removing the /home/secureall/secureall/.store/.client_truststore
file. This will clear the certificates currently loaded in /config/envoy/dynamic_listener_resources.json
, which will quickly recover UI access until a reboot or service restart would cause the issue to recur. The following process can be run an NSX Manager CLI as root user (run st en
if in admin):
1. Run ls -lah /home/secureall/secureall/.store/.client_truststore
and note the current ownership and permissions of the file
2. Rename the .client_truststore file and add .bak to it to create a backup of the file and remove the original:
mv /home/secureall/secureall/.store/.client_truststore /home/secureall/secureall/.store/.client_truststore.bak
*Note that the ownership of the .bak file changes to root:root at this point.
3. Ensure that the UI is now working to connect directly to the same node. You can also test Postman with an API request at this point.
4. Once confirmed, restore the .client_truststore file:
cp /home/secureall/secureall/.store/.client_truststore.bak /home/secureall/secureall/.store/.client_truststore
5. Set the ownership (chmod
) for the .client_truststore file so it matches the original as noted from step 1.
chown <ownername>:<groupname> /home/secureall/secureall/.store/.client_truststore
*Note that permissions should not have changed from rw-r----- . If needed, set it that way with:
chmod 640 /home/secureall/secureall/.store/.client_truststore
6. Verify that the UI and API still work for the node. At this point, external API request should work as well
For situations where CLIENT_AUTH certificates are not in use and can be removed, the following steps are available to prevent recurrence:
GET /api/v1/trust-management/certificates
. curl -H "x-nsx-username: admin" -X GET http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates
-----BEGIN CERTIFICATE-----
" and "-----END CERTIFICATE-----
", counting all characters in between these headers, except for newline characters, which appear as "\n".curl -H "x-nsx-username: admin" -X DELETE http://127.0.0.1:7440/nsxapi/api/v1/trust-management/certificates/<cert-id>
<cert-id>
' is the ID of the certificate which was identified as having a length of multiples of 253 and shows service type of CLIENT_AUTH.Note: If you are using Federation and the certificate is assigned to a PI account used by one of the sites, do not use the delete API above. Please follow the administration guide to replace the site certificate, this will automatically update the certificate used by the PI for that site.
Certificates that are of type CLIENT_AUTH may actually be in use due to integration with things like Tanzu Kubernetes or may be stuck due to NSX Manager failing to release a certificate of this type automatically. It is not a safe procedure to manually release a certificate and Engineering should be engaged prior to doing so. Where the certificates are actually needed, like with Kubernetes, removing the certificates from NSX Manager should not be done. The customer would need to upgrade to the fixed version and utilize the workaround above in the meantime.