VM's lost network connectivity after ESX upgrade from 7.0.2 to 7.0 U3n due to vswitch property set to RUNTIME
search cancel

VM's lost network connectivity after ESX upgrade from 7.0.2 to 7.0 U3n due to vswitch property set to RUNTIME

book

Article ID: 321882

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • On the ESXi host the following vDS property is observed to be RUNTIME instead of the expected CONFIG
#net-dvs | grep "com.vmware.common.opaqueDvs.status.component.vswitch"
com.vmware.common.opaqueDvs.status.component.vswitch = up ,     propType = RUNTIME
  • Observed continuous hot-swap attempts:

    2023-07-25T06:52:40Z nsxaVim: [2102498]: ERROR ApplyNsxDvsConfig on (50 0d fd 25 a4 c2 ef 7d-28 8c af f7 2c 05 12 d0) failed: {'fault': 'PlatformConfigFault', 'msg': 'An error occurred during host configuration.', 'faultMessage': ['Operation failed, diagnostics report: Unable to clear DVS propertycom.vmware.nsx.vdl2.enabled: Status(bad0004)= Busy']}
    2023-07-25T06:52:40Z nsxaVim: [2102498]: ERROR Failed to hot-swap cvds ['50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx']
    2023-07-25T06:52:40Z nsxaVim: [2102498]: INFO HotSwapCvds result: Failed to hot-swap cvds ['50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx]ApplyNsxDvsConfig on (50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx) failed: {'fault': 'PlatformConfigFault', 'msg': 'An error occurred during host configuration.', 'faultMessage': ['Operation failed, diagnostics report: Unable to clear DVS propertycom.vmware.nsx.vdl2.enabled: Status(bad0004)= Busy']}
    2023-07-25T06:52:40Z nsxaVim: [2102498]: INFO Result msg:[b"Failed to hot-swap cvds ['50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx']ApplyNsxDvsConfig on (50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx) failed: {'fault': 'PlatformConfigFault', 'msg': 'An error occurred during host configuration.', 'faultMessage': ['Operation failed, diagnostics report: Unable to clear DVS propertycom.vmware.nsx.vdl2.enabled: Status(bad0004)= Busy']}"]
 
  • Hostd logs report the below errors:

    2023-07-25T06:42:59.760Z error hostd[2101228] [Originator@6876 sub=Hostsvc.NetworkProvider] Component [com.vmware.common.opaqueDvs.status.component.lcp.kcpSyncStatus] status [down] on DVS
    2023-07-25T06:42:59.761Z info hostd[2101228] [Originator@6876 sub=Hostsvc.NetworkProvider] Setting NSXT status on DVS [50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx] to [down]
    2023-07-25T06:42:59.802Z error hostd[2101228] [Originator@6876 sub=Hostsvc.NetworkProvider] Component [com.vmware.common.opaqueDvs.status.component.lcp.kcpSyncStatus] status [down] on DVS
    2023-07-25T06:42:59.802Z info hostd[2101228] [Originator@6876 sub=Hostsvc.NetworkProvider] Setting NSXT status on DVS [50 0d fd 25 a4 c2 ef 7d-28 8c xx xx xx xx xx xx] to [down

     


Environment

VMware NSX-T

Cause

  • During TN create/update, if VDS is being enabled for NSX, then opsAgent initializes all these opaqueDvs.status.component.* properties to "up".These statuses are used by vCenter for vmotion compatibility checks. The expected behavior is that the corresponding sub-component (e.g. vswitch, vdl2, lcp) will modify the status value to "up/down" depending on its health.
  •  NSX removal, happened due to NSXA not being able to read the property com.vmware.common.opaqueDvs.status.component.vswitch using API 'dvsManager.retrieveNsxDvsConfig' because it's propType is RUNTIME by esx datapath and subsequent workflow that relies on this, got affected which resulted in multiple hotswap attempts and caused network outage.

Resolution

  • ESX DP is setting the propertytype to RUNTIME and needs to be set to CONFIG, this is being fixed in 3.2.4.
  •  With this fix in place cvds HotSwap will be avoided.
  •  4.x versions already contain the fix


Workaround:
  • For the failed host that is already upgraded, we would need to unprep and reprep the host to resolve the issue.
  • The unprep of the host may also get failed with an error: "ERROR: Failed to reset nsxa app of nsx-opsagent. Please check ospagent logs for more details." due to the property being set to RUNTIME.
  • This can be resolved following the KB article: 93427


Additional Information

https://kb.vmware.com/s/article/93427

Impact/Risks:
The VMs on the impacted host may face network interruptions.