VCF Operations Collector (Cloud Proxy) fails to boot up after deployment in VCF Operations 9.x because haproxy fails to start
search cancel

VCF Operations Collector (Cloud Proxy) fails to boot up after deployment in VCF Operations 9.x because haproxy fails to start

book

Article ID: 417887

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

  • Fleet Manager report manager shows deployment fails at stage 13 of the request at the point of VCF Adapter Configuration with below error
    LCMVROPSYSTEM25127
    Error Code: LCMVROPSYSTEM25127
    Error while configuring the adapter.
    Some SDDC Manager hosts are not configured. Failed SDDC
    Manager hosts are: [hosnamefqdn]
  • The startup of haproxy times out.
    Aug 15 05:37:27 ###.###.###.### systemd[1]: Starting HAProxy Load Balancer...
    Aug 15 05:38:05 ###.###.###.### haproxy[7551]: [NOTICE]   (7551) : config : [/etc/haproxy/haproxy.cfg:87] : 'server PrxyRC_CRUSH_FTP_DEV_BE/CRUSH_FTP_DEV_0' : could not resolve address 'api-devlvn.broadcom.net', disabling server.
    Aug 15 05:38:45 ###.###.###.### haproxy[7551]: [NOTICE]   (7551) : config : [/etc/haproxy/haproxy.cfg:96] : 'server PrxyRC_CRUSH_FTP_STAGE_BE/CRUSH_FTP_STG_0' : could not resolve address 'eapi-gcpstg.broadcom.com', disabling server.
    Aug 15 05:38:57 ###.###.###.### systemd[1]: haproxy.service: start operation timed out. Terminating.
    Aug 15 05:38:57 ###.###.###.### systemd[1]: haproxy.service: Failed with result 'timeout'.
    Aug 15 05:38:57 ###.###.###.### systemd[1]: Failed to start HAProxy Load Balancer.
    Aug 15 05:38:58 ###.###.###.### systemd[1]: haproxy.service: Scheduled restart job, restart counter is at 1.
    Aug 15 05:38:58 ###.###.###.### systemd[1]: Stopped HAProxy Load Balancer.
    Aug 15 05:38:58 ###.###.###.### systemd[1]: Starting HAProxy Load Balancer...
    Aug 15 05:39:37 ###.###.###.### haproxy[7594]: [NOTICE]   (7594) : config : [/etc/haproxy/haproxy.cfg:87] : 'server PrxyRC_CRUSH_FTP_DEV_BE/CRUSH_FTP_DEV_0' : could not resolve address 'api-devlvn.broadcom.net', disabling server.
    Aug 15 05:39:57 ###.###.###.### systemd[1]: haproxy.service: Deactivated successfully.
    Aug 15 05:39:57 ###.###.###.### systemd[1]: Stopped HAProxy Load Balancer.
  • /var/log/haproxy-admin.log shows errors similar to : 
    2025-10-27T09:23:18+00:00 localhost haproxy[#####]: Server PrxyRC_CRUSH_FTP_STAGE_BE/CRUSH_FTP_STG_0 is DOWN, reason: Layer4 connection problem, info: "Connection timed out", check duration: 130732ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    
    2025-10-28T10:42:00+00:00 localhost haproxy[####]: Server PrxyRC_CRUSH_FTP_STAGE_BE/CRUSH_FTP_STG_0 is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 1ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
  • The dig command for api-devlvn.broadcom.net shows "timed out."
     dig api-devlvn.broadcom.net
    ;; communications error to ###.###.###.####53: timed out 
    
    ; <<>> DiG 9.20.0 <<>> api-devlvn.broadcom.net
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 65000
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4000
    ;; QUESTION SECTION:
    ;api-devlvn.broadcom.net.       IN      A
    
    ;; Query time: 4416 msec
    ;; SERVER: ###.###.###.####53(###.###.###.###) (UDP)
    ;; WHEN: Mon Sep 01 06:33:02 UTC 2025
    ;; MSG SIZE  rcvd: 52
  • It is not possible to make changes to the DNS configuration as per KB The firstboot of the VCF Operations Collector (Cloud Proxy) fails because haproxy does not start and times out 

Environment

VCF Operations 9.0

VCF Operations 9.0.1 

Cause

When HAProxy starts, it tries to resolve all FQDNs in its configuration. The /etc/haproxy/haproxy.cfg file has two hostnames configured as default api-devlvn.broadcom.net and eapi-gcpstg.broadcom.com which do not exist in DNS records . Normally, the nslookup should quickly return an NXDOMAIN(even for recursive DNS) when querying api-devlvn.broadcom.net and eapi-gcpstg.broadcom.com, but instead the DNS path timed out, such timeouts indicate a DNS infrastructure or network firewall/responsiveness issue.  A healthy DNS setup should respond fast even for non-existent names.  

Resolution

It is essential that you investigate why the DNS resolver did not return a quick NXDOMAIN as expected, having such delayed response can impact not only the first boot of Cloud Proxy deployment but also can surface in other areas.  

A workaround is to remove eapi-gcpstg.broadcom.com and eapi.broadcom.com endpoints from HAProxy configuration.  

Procedure: Removing eapi-gcpstg.broadcom.com and eapi.broadcom.com Endpoints from HAProxy Configuration

  1. Take a snapshot of Cloud Proxy you can use the following documentation as a guide Managing snapshots in vSphere Web Client 
  2. Backup the HAProxy Configuration Generator Script
    Before making any changes, create a backup of the existing script: 
    cp /usr/lib/vmware-vrops-cprc/bin/python/cprcHAProxyConfiguration.py \
       /usr/lib/vmware-vrops-cprc/bin/python/cprcHAProxyConfiguration.py.bak
  3.  Modify the cprcHAProxyConfiguration.py Script
    File path: /usr/lib/vmware-vrops-cprc/bin/python/cprcHAProxyConfiguration.py
    1. Update the Frontend Section
      At line 330, replace the following functional block (lines):
      frontendSection.append('acl is_dev path_beg /api-devlvn.broadcom.net\n' +
      ' acl is_stage path_beg /eapi-gcpstg.broadcom.com\n' +
      ' acl is_prod path_beg /eapi.broadcom.com')
      frontendSection.append('use_backend PrxyRC_CRUSH_FTP_DEV_BE if is_dev\n' +
      ' use_backend PrxyRC_CRUSH_FTP_STAGE_BE if is_stage\n' +
      ' use_backend PrxyRC_CRUSH_FTP_PROD_BE if is_prod')

      with 

      frontendSection.append('acl is_prod path_beg /eapi.broadcom.com')
      frontendSection.append('use_backend PrxyRC_CRUSH_FTP_PROD_BE if is_prod')
    2.  Remove Backend Configuration for DEV and STAGE
      At line 751, delete the following lines: 
      logging.info(f"Configuring Crush Ftp backend section: {HA_PROXY_CRUSH_FTP_DEV_BE_SECTION}")
      sections[HA_PROXY_CRUSH_FTP_DEV_BE_SECTION] = list()
      configureHAProxyBackendSection(sections[HA_PROXY_CRUSH_FTP_DEV_BE_SECTION], HA_PROXY_BE_CRUSH_FTP_DEV_SERVER_NAME, configurationType = cprcBackendConfigurationType.LOG_ASSIST)
      
      logging.info(f"Configuring Crush Ftp backend section: {HA_PROXY_CRUSH_FTP_STAGE_BE_SECTION}")
      sections[HA_PROXY_CRUSH_FTP_STAGE_BE_SECTION] = list()
      configureHAProxyBackendSection(sections[HA_PROXY_CRUSH_FTP_STAGE_BE_SECTION], HA_PROXY_BE_CRUSH_FTP_STAGE_SERVER_NAME, configurationType = cprcBackendConfigurationType.LOG_ASSIST)
  4.  Save the File
    After completing the edits, save and close the script.
  5. Reboot the Cloud Proxy Nodes
    After saving the configuration, reboot the Cloud Proxy node to ensure the updated HAProxy configuration is applied correctly.
  6. Verify HAProxy Service Status 
    systemctl status haproxy.service