Host Transport Node Shows as Down in NSX UI After Upgrade Due to Modified Configuration File.
search cancel

Host Transport Node Shows as Down in NSX UI After Upgrade Due to Modified Configuration File.

book

Article ID: 314003

calendar_today

Updated On: 04-21-2025

Products

VMware NSX

Issue/Introduction

  • Editing persistent configuration files on a Host Transport Node such as nsx-cfgagent.xml can lead to issues during NSX upgrades. Since these files are not updated during the upgrade if they’ve been modified, critical services like cfgAgent may fail to start, causing the Host Transport Node to show as Down in the NSX UI.

    [root@esxcli:/var/core] ls -lh
    total 16M
    -rwxrwxr-x 1 root sssd 14M Nov 16 04:33 nsx-cfgagent-zdump.000

    Check nsx-syslog.log to see if the APP’s which are enabled in nsx-cfgagent.xml are properly started or not, if an app is started you see similar logs as below.

    2023-12-12T11:34:03.835Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] L2 application starts
    2023-12-12T11:34:03.835Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] L3 application starts
    2023-12-12T11:34:03.836Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Config application starts
    2023-12-12T11:34:03.839Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Traceflow application starts
    2023-12-12T11:34:03.839Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] BFD application starts
    2023-12-12T11:34:05.070Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] DFW application starts
    2023-12-12T11:34:05.070Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] LB application starts
    2023-12-12T11:34:05.072Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Service insertion application starts
    2023-12-12T11:34:05.074Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Intrusion Detection Service application starts
    2023-12-12T11:34:05.079Z In(182) cfgAgent[681923]: NSX 681923 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="3C13A300" level="info"] Livetrace application starts

    In this case only 2 apps got started as shown below. That's the reason cfgAgent did not start properly and core got generated.

    syslog.7.gz:2023-11-15T12:08:05.366Z NSX[2108361]: nsx-cfgagent service starts
    nsx-syslog.log:2023-11-15T21:35:22.759Z cfgAgent[2449402]: NSX 2449402 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="ABEF3C80" level="info"] L2 application starts
    nsx-syslog.log:2023-11-15T21:35:22.759Z cfgAgent[2449402]: NSX 2449402 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="ABEF3C80" level="info"] L3 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:05.981Z cfgAgent[2108360]: NSX 2108360 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="6389EC80" level="info"] L2 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:05.981Z cfgAgent[2108360]: NSX 2108360 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="6389EC80" level="info"] L3 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:21.545Z cfgAgent[2109879]: NSX 2109879 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="65326C80" level="info"] L2 application starts
    nsx-syslog.0.gz:2023-11-15T12:08:21.545Z cfgAgent[2109879]: NSX 2109879 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="65326C80" level="info"] L3 application starts

  • Upgraded host will be shown as down in the NSX UI.
  • vMotion will not work if it involves affected hosts.
  • This issue is only seen when a persistent file is manually modified before the upgrade.(ex: nsx-cfgagent.xml).
  • To check if there is a cfgAgent core dump check host bundle /var/core/.
  • Other Host Transport Nodes in the Cluster Upgrade will fail as VMs can't be moved back to the upgraded host from the non upgraded hosts.
  • Host is unusable due to non-availability of cfgagent process.

Environment

VMware NSX

Cause

If a Host Transport Node has modified persistent configuration files, NSX will skip updating those files during the upgrade process, as they are treated as user-modified.

Resolution

This issue is resolved in VMware NSX 4.2.0 can be downloaded from Broadcom Support website in the My Downloads section


Workaround:

If you have encountered this issue, please contact Broadcom Support.