The VCF stack deployment is stuck as VCF Operations deployment stage is in failed status with Error:
"messageId": "LCMVROPSSYSTEM29002","message": "Operations collector group operation failure","exceptionMessage": "Wait timeout, Cloud Proxy node didn't join the Operations cluster, retry after sometime."VCF Operations 9.x
When the HAProxy service starts, it attempts to resolve all FQDNs specified in its configuration. In this case, resolution of the hostnames below is attempted:
api-devlvn.broadcom.neteapi-gcpstg.broadcom.comeapi.broadcom.comNormally, DNS should quickly return an NXDOMAIN response (even with recursive DNS enabled). However, in this situation, the DNS resolution instead timed out.
This indicates an issue with DNS infrastructure, network firewall configuration, or overall DNS responsiveness, rather than recursion itself.
Engineering are aware of this issue and it will be resolved in an upcoming release.
Workaround:
As a temporary workaround to unblock VCF deployment, add these entries to the end of /etc/hosts file of the Cloud Proxy (replacing <GW IP Addr> with the IP address of the Default Gateway).
<GW IP Addr> api-devlvn.broadcom.net <GW IP Addr> eapi-gcpstg.broadcom.com<GW IP Addr> eapi.broadcom.com