Hyperbus Connection Found UNHEALTHY (MISS_VERSION_HANDSHAKE) on all container vm's on a host after an upgrade to NSX-T 3.1.3.3
search cancel

Hyperbus Connection Found UNHEALTHY (MISS_VERSION_HANDSHAKE) on all container vm's on a host after an upgrade to NSX-T 3.1.3.3

book

Article ID: 316652

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Container VM's show a status of 'MISS_VERSION_HANDSHAKE' if queried with the command 'nsxcli -c get hyperbus connection info'.
nsxcli -c get hyperbus connection info

Thu Dec 23 2021 UTC 20:12:31.155

        VIFID              Connection             Status                HostSwitchID

65101bd0-####-####-####-########a20     169.254.1.10:2345           MISS_VERSION_HANDSHAKE 50 1d 00 40 ## ## ## ##-## ## ## ## 9a 83 48 44

8afe08b9-####-####-####-########4c4     169.254.1.11:2345           MISS_VERSION_HANDSHAKE 50 1d 00 40 ## ## ## ##-## ## ## ## 9a 83 48 44
  • Creation of additional containerized workloads may fail. 


To Verify the cause of this symptom, run the following commands via SSH on an affected ESXi host:
 

1. egrep 'port |port.alias|port.volatile.vlan' commands/net-dvs_-l.txt | egrep 'vmk50|.alias = hb-' -A 2 -B 1 

Example Return
 
   port ec781af3-####-####-####-########f54:
        com.vmware.common.port.alias = vmk50 , propType = CONFIG
            load balancing = source virtual port id
        com.vmware.common.port.volatile.vlan = VLAN 0
--
    port hb-89ba14b7-####-####-####-########79e:
        com.vmware.common.port.alias = hb-89ba14b7-####-####-####-########79e ,    propType = CONFIG
            load balancing = source virtual port id
        com.vmware.common.port.volatile.vlan = VLAN 4094
--
    port hb-########-####-####-####-########1622:
        com.vmware.common.port.alias = hb-8d943d75-####-####-####-########622 ,    propType = CONFIG
            load balancing = source virtual port id
        com.vmware.common.port.volatile.vlan = VLAN 4094
--
    port hb-c8f41d3c-####-####-####-########0f3:
        com.vmware.common.port.alias = hb-c8f41d3c-####-####-####-########0f3 ,    propType = CONFIG
            load balancing = source virtual port id
        com.vmware.common.port.volatile.vlan = VLAN 4094
*** NOTE:  The mismatch between the VLAN 4094 tag on Diego Cell hyperbus ports compared to the VLAN 0 tag on vmk50 is indicative of this issue. 




2.  net-dvs -l 

*** NOTE: Look for the pvlanMap section, as seen below 
host properties:
  com.vmware.common.host.portset = DvsPortset-3 ,   propType = CONFIG 
  com.vmware.nsx.vdl2.enabled = true ,  propType = CONFIG  
  com.vmware.nsx.spf.enabled = true ,   propType = CONFIG 
  com.vmware.nsx.kcp.enable = true , propType = CONFIG
  com.vmware.vswitch.pvlanMap:   
    (4093, 4093) - promiscuous
    (4093, 4094) - isolated 
    propType = RUNTIME
  com.vmware.common.opaqueDvs.status.component.vswitch = up ,   propType = CONFIG


3. Check rebootless_upgrade flag on the cluster.

NOTE: The default, and expected value, is "rebootless_upgrade", "value" : "true"} 

Find the group id for the cluster using GET /api/v1/upgrade/upgrade-unit-groups?component_type=HOST or from UI host upgrade page.

Then run GET /api/v1/upgrade/upgrade-unit-groups/<group id> and look for the "rebootless_upgrade" flag in the return. 



Environment

VMware NSX-T Data Center 3.x

Cause

The issue described above can arise if the "rebootless_upgrade" flag is set to "false" for an NSX-T host upgrade to version 3.1.3.3.

If that flag is set to false, the upgrade script does not call a specific method that removes the pvlanMapping, which in turn can cause the symptom described above. 

 

 

Resolution

This issue is resolved in VMware NSX-T Data Center 3.2.0.1.

Workaround:

There are two workarounds to this particular symptom.  The first is a proactive, preventative workaround.  The second is a reactive workaround, to be applied if this symptom is encountered. 

Preventative Workaround: (Before upgrade to 3.1.3.3, to prevent occurence).

Check the "rebootless_upgrade" flag on each Host Upgrade Group, prior to NSX-T upgrade.

Find the group id for the cluster using GET /api/v1/upgrade/upgrade-unit-groups?component_type=HOST or from UI host upgrade page.

Then run GET /api/v1/upgrade/upgrade-unit-groups/<group id> and look for the "rebootless_upgrade" flag in the return. 

If a Host Upgrade Group returns "false", use a PUT api call to the same URL to change the value to "true".

NOTE: The default, and expected value, is "rebootless_upgrade", "value" : "true"} 



Reactive Workaround: (After upgrade to 3.1.3.3, if encountered).
 

1. Run nsxdp-cli vswitch instance list on an affected ESXi host.  Pay attention to the switch name in the first line of the return. 

In the example below, the switch name is 'RegionA01-VDS7'

DvsPortset-0 (RegionA01-VDS7) 50 1d 00 40 ## ## ## ##-## ## ## ## 9a 83 48 44

 

2. Run net-dvs -l | egrep 'port |port.alias|port.volatile.vlan' | egrep 'vmk50|.alias = hb-' -A 2 -B 1 to identify the port ID's of vmk50 and each hyperbus port on the affected ESXi host. 

Example Return:

 

port 66a8205e-####-####-####-########a34:

com.vmware.common.port.alias = vmk50 , propType = CONFIG

com.vmware.common.port.volatile.vlan = VLAN 4093

--

port hb-0c925e99-####-####-####-########20f:

com.vmware.common.port.alias = hb-0c925e99-####-####-####-########20f , propType = CONFIG

com.vmware.common.port.volatile.vlan = VLAN 4094

--

port hb-e928e034-####-####-####-########036:

com.vmware.common.port.alias = hb-e928e034-####-####-####-########036 , propType = CONFIG

com.vmware.common.port.volatile.vlan = VLAN 4094
 

3. Run the following commands to change each VLAN tag to 0 on the affected ESXi host:

net-dvs -v "0" -p 66a8205e-####-####-####-########a34 RegionA01-VDS7 

*** above command clears pvlan from vmk50 port

net-dvs -v "0" -p hb-0c925e99-####-####-####-########20f RegionA01-VDS7 

*** above command clears pvlan from hyperbus port

net-dvs -v "0" -p hb-e928e034-####-####-####-########036 RegionA01-VDS7

*** above command clears pvlan from hyperbus port

net-dvs -u "com.vmware.vswitch.pvlanMap" -p hostPropList RegionA01-VDS7 

*** above command clears pvlanMap from ESXi host

NOTE: Each Diego Cell VM has a hyperbus port (hb-xxxxxx port) on each affected ESXi host.