Security Only label is no longer observed after a vcenter upgrade
search cancel

Security Only label is no longer observed after a vcenter upgrade

book

Article ID: 382436

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

The 'Security Only' label disappears for one or more of the security only cluster deployments in NSX UI > Fabric > ESXi Hosts (Security Only Cluster) after the vcenter upgrade.

Environment

This is affecting NSX versions 3.2.x and 4.1.x with security only cluster deployments

Cause

The issue is observed when a host is in a bad state in the Security Only Cluster  and the vcenter upgrade is performed. Without remediating the host. 

  • VCFullSyncForSecurity is called after the vcenter upgrade and in case if one of the hosts are in bad state the TNP is updated an empty switch = hostSwitches : [[]] 

  • As a result the DVS can be disabled and cause a datapath impact

Resolution

In preparation for performing a Vcenter upgrade. Check the health of ESXi hosts in the cluster. This is recommended in our upgrade documentation. 

Documentation = https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/upgrade/GUID-1D36788D-741A-4D12-A0F1-DA3FF0DB68D5.html


If the cluster is already impacted use the workaround:
Create a new cluster and prepared for NSX and move the impacted workload to new cluster.

Additional Information

Below log snips can be found in the marked directories. The matched criteria of time, Name, UUID, and amount of ESXi hosts may be different. Please match accordingly. 

/var/log/proton/nsxapi.log 
The impacted cluster had 18 hosts and 1 host had the DVS missing even prior to the vCenter upgrade. It will also point out 'prepared for security'

2024-07-21T14:47:32.027Z INFO workerTaskExecutor-20 PolicyDVPGUtils 4414 POLICY [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Cluster UUID-XXXX-XXXX-XXXX-XXXXXXXXXXXX:domain-cXXXXXX is prepared for security.
2024-07-21T14:47:32.027Z INFO workerTaskExecutor-20 DvsAndDvpgWorkerForSecurity 4414 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Distributed Virtual Switch XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX is not connected to all the hosts in the cluster UUID-XXXX-XXXX-XXXX-XXXXXXXXXXXX:domain-cXXXXXX. Cluster has 18 host(s). But only 17 host(s) are connected to the Distributed Virtual Switch.
AND-
2024-07-21T15:00:24.470Z WARN org.corfudb.runtime.collections.streaming.StreamPollingScheduler-worker-0 TransportNodeDesiredStateErrorServiceImpl 4414 FABRIC [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Number of hosts that went into error during TNCollection CRUD operation for compute collection UUID-XXXX-XXXX-XXXX-XXXXXXXXXXXX:domain-cXXXXXX.

--------------

var/log/cm-inventory/cm-inventory.log
These are the time when the vCenter was down per Manager's perspective. << Due to vCenter upgrade
We can see [[]] blank from inventory and connection down.

2024-07-21T14:41:57.694Z ERROR InventoryFetcher-b52828ca-797f-4df3-b989-41984abdee89 ExponentialBackOffLoginImpl 4322 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP40107" level="ERROR" subcomp="cm-inventory"] Marking VC vcenter.compute.manager connection DOWN as VC connection attempts are exhausted.
And-
2024-07-21T15:00:24.392Z INFO task-executor-4 TransportNodeProfile 4402 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Comparing TransportNodeProfile hostSwitches : [[]]