VC upgrade from SDDC manager fails with error: "Failed to query deployment status for appliance vcenter-upgrade-target-appliance_id after trying all ip addresses"
search cancel

VC upgrade from SDDC manager fails with error: "Failed to query deployment status for appliance vcenter-upgrade-target-appliance_id after trying all ip addresses"

book

Article ID: 377002

calendar_today

Updated On:

Products

VMware SDDC Manager VMware vCenter Server

Issue/Introduction

Symptoms:

 

  • vCenter Upgrade fails on the beginning of Stage 2 as Network Configuration fails to be applied to target vCenter

  • LCM Workflow logs on SDDC manager contains errors similar to the excerpt below:


    vCSACliInstallLogger - DEBUG - Failed to query appliance API against VM 'vcenter-upgrade-target-appliance_110dc84f-1b48-4daf-a362-d3c6f11ca193

    ' on 'source.vc.domain.com' for the deployment status because 'Failed to query deployment status for appliance vcenter-upgrade-target-appliance_110dc84f-1b48-4daf-a362-d3c6f11ca193 after trying all ip addresses. 
    ../..

    vCSACliInstallLogger - DEBUG - traceback: Traceback (most recent call last):
      File "/build/mts/release/bora-22890472/bora/install/vcsa-installer/vcsaCliInstaller/cli_tasks/monitor/monitor_vcenter_deployment_task.py", line 166, in monitor_vcsa_deployment
      File "/build/mts/release/bora-22890472/bora/install/vcsa-installer/vcsaCliInstaller/cli_tasks/monitor/monitor_vcenter_deployment_task.py", line 554, in get_vcenter_deployment
    RuntimeError: Failed to query deployment status for appliance vcenter-upgrade-target-appliance_110dc84f-1b48-4daf-a362-d3c6f11ca193 after trying all ip addresses
    During handling of the above exception, another exception occurred:
     
    Traceback (most recent call last):
      File "/build/mts/release/bora-22890472/bora/install/vcsa-installer/vcsaCliInstaller/cli_tasks/monitor/monitor_vcenter_deployment_task.py", line 402, in execute
      File "/build/mts/release/bora-22890472/bora/install/vcsa-installer/vcsaCliInstaller/cli_tasks/monitor/monitor_vcenter_deployment_task.py", line 173, in monitor_vcsa_deployment
    tasking.task.TaskException: Failed to query deployment status for appliance vcenter-upgrade-target-appliance_110dc84f-1b48-4daf-a362-d3c6f11ca193 after trying all ip addresses. 
    If you see this during firstboot, this probably indicates the VCSA is now rebooting and the status will resume in a few minutes.

  • upgrade-export.log on target vCenter contains errors similar to the excerpt below:

    INFO upgrade_commands Shutting down source machine [source.vc.domain.com]
    INFO networking_utils isHostReachable(): getaddrinfo() found 1 entries (first is used): FAMILY: AddressFamily.AF_INET, TYPE=SocketKind.SOCK_STREAM, PROTO=6, CANONNAME=, ADDR=('xxx.xxx.xxx.xxx 22)
    ERROR networking_utils isHostReachable() failed: [TimeoutError]: "timed out", treating as unreachable host
    INFO upgrade_commands Source machine is down, waiting for 60 additional seconds to make sure it is completely down.
    INFO upgrade_commands Shutdown completed successfully.
    ..
    tderr: 08/07/2024 08:33:34 [ERROR] Cannot run /sbin/ifup eth0 command. Unknown error. Return code : 256 output: Make sure the interface is down or not assigned any IP
    eth0 is DOWN or not assigned an IP. Bringnig eth0 up...
    Can not find the manual filename, let us search for the auto filename
    Performing duplicate address check for IPv4 address xxx.xxx.xxx.xxx
    ESC[1;31mError: IP already exists in the network
    Unable to set the network parameters

    Command failed with exit status 1. Detail: DD/MM/YYYY 08:33:34 [ERROR] Cannot run /sbin/ifup eth0 command. Unknown error. Return code : 256 output: Make sure the interface is down or not assigned any IP
    eth0 is DOWN or not assigned an IP. Bringnig eth0 up...
    Can not find the manual filename, let us search for the auto filename
    Performing duplicate address check for IPv4 address xxx.xxx.xxx.xxx
    ESC[1;31mError: IP already exists in the network

 

Environment

VMware Cloud foundation
VMware vCenter Server

Cause

This issue is identified as an ARP cache problem.
It has been observed when the same temporary IP address is used for consecutive vCenter upgrades

Resolution

Workaround:

 

  1. SSH into the vCenter prior to upgrade and flush the ARP cache using the command:

    ip -s -s neigh flush all

  2. Once Stage 1 of the upgrade is complete SSH into the deployed target VCSA using the temporary IP address and edit the following configuration file using a vi editor:

    /usr/bin/setnet

  3. Adjust the following parameters and save the file:

    NETWORK_RETRY_COUNT = 5                   # <== Change this value from 5 to 20
    NETWORK_RETRY_WAIT_TIME_SEC = 15          # <== Change this value from 15 to 30

    This will give the upgrade more time to successfully apply the network settings.
  4. Once source vCenter get powered off run the below command on all the ESXi hosts of the cluster:

    esxcli network ip neighbor remove -i vmk2 -a <SOURCE_VC_IP> -v 4


  5. Initiate Stage 2 of the upgrade workflow.