If any of the sticky files are edited then these files will not be updated during NSX upgrade on the Host Transport Node resulting in the Host Transport Node showing as down in the NSX UI
search cancel

If any of the sticky files are edited then these files will not be updated during NSX upgrade on the Host Transport Node resulting in the Host Transport Node showing as down in the NSX UI

book

Article ID: 314003

calendar_today

Updated On: 12-10-2024

Products

VMware NSX

Issue/Introduction

  • If any of the sticky files are edited then these files will not be updated during NSX upgrade on the Host Transport Node resulting in the Host showing as down in the NSX UI
    Eg: Modifying the nsx-cfgagent.xml, may result in cfgagent not booting up properly. The implication of modifying different sticky files will lead to different outcomes. Below is one such instance where the cfgagent fails to start properly resulting one or all of the below outcomes.

    [root@esxcli:/var/core] ls -lh
    total 16M
    -rwxrwxr-x 1 root sssd 14M Nov 16 04:33 nsx-cfgagent-zdump.000

    Check nsx-syslog.log to see if the APP’s which are enabled in nsx-cfgagent.xml are properly started or not, if an app is started you see similar logs as below.

    2023-12-12T11:34:03.835Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] L2 application starts
    2023-12-12T11:34:03.835Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] L3 application starts
    2023-12-12T11:34:03.836Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Config application starts
    2023-12-12T11:34:03.839Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Traceflow application starts
    2023-12-12T11:34:03.839Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] BFD application starts
    2023-12-12T11:34:05.070Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] DFW application starts
    2023-12-12T11:34:05.070Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] LB application starts
    2023-12-12T11:34:05.072Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Service insertion application starts
    2023-12-12T11:34:05.074Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Intrusion Detection Service application starts
    2023-12-12T11:34:05.079Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Livetrace application starts

    In this case only 2 apps got started as shown below. That's the reason cfgAgent did not start properly and core got generated.

    syslog.7.gz:2023-11-15T12:08:05.366Z NSX[2108361]: nsx-cfgagent service starts
    nsx-syslog.log:2023-11-15T21:35:22.759Z cfgAgent[2449402]: NSX 2449402 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="ABEF3C80" level="info"] L2 application starts
    nsx-syslog.log:2023-11-15T21:35:22.759Z cfgAgent[2449402]: NSX 2449402 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="ABEF3C80" level="info"] L3 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:05.981Z cfgAgent[2108360]: NSX 2108360 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="6389EC80" level="info"] L2 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:05.981Z cfgAgent[2108360]: NSX 2108360 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="6389EC80" level="info"] L3 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:21.545Z cfgAgent[2109879]: NSX 2109879 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="65326C80" level="info"] L2 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:21.545Z cfgAgent[2109879]: NSX 2109879 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="65326C80" level="info"] L3 application starts

  • Upgraded host will be shown as down in the NSX UI.
  • vMotion will not work if it involves affected hosts
  • This issue is only seen when a sticky bit file is manually modified before the upgrade.(ex: nsx-cfgagent.xml).
  • To check if there is a cfgAgent core dump check host bundle /var/core/\
  • Other Host Transport Nodes in the Cluster Upgrade will fail as VMs can't be moved back to the upgraded host from the non upgraded hosts.
    Host is unusable due to non-availability of cfgagent process.

Environment

VMware NSX

Cause

If the Host Transport Node has modified sticky bit file, when the NSX on the Host Transport Node is updated the UC will not upgrade any modified sticky bit file.

Resolution

This issue is resolved in VMware NSX 4.2.0 can be downloaded from Broadcom Support website in the My Downloads section


Workaround:

If you have encountered this issue, please contact Broadcom Support.

 

Additional Information

Versions where this is a known issue: NSX 4.x
Version where this is fixed : NSX 4.2.0 & later and can be downloaded from Broadcom Support website in the My Downloads section