Host Transport Node Status: Agent Degraded showing nsx_vdl2 being degraded.
search cancel

Host Transport Node Status: Agent Degraded showing nsx_vdl2 being degraded.

book

Article ID: 411995

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In the NSX UI, on the System > Fabric > Hosts page, one or more ESX transport nodes are in a degraded state.
  • Drilling down in the degraded host shows that the NSX_VDL2 agent is in a degraded state.
  • You see messages similar to the following in /var/log/proton/nsxapi.log file on the NSX manager node:
    2025-10-25T22:15:32.658Z  INFO ShaVerticalMessageDispatcher1 ShaVerticalMessageHandler 5988 MONITORING [nsx@4413 comp="nsx-manager" level="INFO" subcomp="manager"] receive agent data AgentStatusAggregationModel{TransportNode =######-####-####-####-########...
    AgentInfo{AgentName=NSX_VDL2, Status=DEGRADED, errorDetail=All bfd tunnels are down for some of vteps., connectionStatus=null,

    025-10-31T16:32:30Z In(182) nsx-sha: NSX 2100889 - [nsx@4413 comp="nsx-esx" subcomp="sha" username="root" level="INFO"] Dhreport - reporting, tn_agent_status:{"oid": {"left":"########288277069942","right":"########022311121664"},"status":[{"type":"TN_AGENT","tn_agent_status":{"threat_state":{},"agent_state": "total_status":"DEGRADED","up_count":17,"down_count":0,"agent_status":[{"type":"NSX_ENS","status":"UP","health_metrics":[{"metric_name":"last_status_update_time","value":"1761928322713","type":"TIMESTAMP","unit":"MILLISECOND"},"metric_name":"uptime","value":"2495020174","type":"INTEGER","unit":"MILLISECOND"}]},{"type":"NSX_VDR","status":"UP","health_metrics":[{"metric_name":"last_status_update_time","value":"1761928322713","type":"TIMESTAMP","unit":"MILLISECOND"},{"metric_name":"uptime","value":"2495021408","type":"INTEGER","unit":"MILLISECOND"}]},{"type":"NSX_VDL2","status":"DEGRADED","error_detail":"All bfd tunnels are down for some of vteps."
  • You see messages similar to the following in the /var/run/log/vmkernel.log file on the affected ESX host:
    2025-10-28T22:04:10.183Z In(182) vmkernel: cpu48:2098448)VDL2isVmknicBfdWait:4385:[nsx@4413 comp="nsx-esx" subcomp="vdl2-24952112"][vmknic:vmk11 ID:0 switch:DvsPortset-0] state change to BFD down
    2025-10-28T22:04:10.183Z In(182) vmkernel: cpu48:2098448)VDL2SetVmknicBfdDown:1736:[nsx@4413 comp="nsx-esx" subcomp="vdl2-24952112"][vmknic:vmk10 ID:0 switch:DvsPortset-0] old state : 3, vmknic down
  • There may be VMs running on the affected host without issue.
  • The host is configured with more than one Tunnel Endpoint (TEP) address.
  • Checking the status of the TEP interfaces on the host shows that one them is in a BFD_DOWN state:
    net-vdl2 -l -s <vSwitch name>
    ...
                    VTEP Interface: vmk11
                          DVPort ID:      05f0af3f-####-####-####-5c429122df85
                          Switch Port ID: ########
                            Endpoint ID:    1
                            VLAN ID:        110
                          Label:          #####
                          Uplink Port ID: ##########
                            Is Uplink Port LAG:     No
                            IP:             192.168.10.9
                            Netmask:        255.255.255.224
                            Segment ID:     192.168.10.0
                            IPv6:           ::
                            Prefix Length:  0
                            Segment ID6:    ::
                            GW IP:          192.168.10.1
                            GW MAC:         00:50:56:0d:00:0a
                            GW V6 IP:               ::
                            GW V6 MAC:              00:00:00:00:00:00
                            IP Acquire Timeout:     0
                            IPv6 Acquire Timeout:   0
                            Multicast Group Count:  0
                            Is DRVTEP:      No
                            State4:         DOWN : BFD_DOWN/VALID_IP
                            State6:         DOWN : INIT/NO_IP
    ...
  • Checking the BFD sessions on the host shows that there are no sessions associated with the vmk# interface with the BFD_DOWN state:
    nsxdp-cli bfd sessions list
    Remote                        Local                         local_disc          remote_disc         recvd               sent      local_state         local_diag                              client              flaps               bfd_type
    192.168.10.69                 192.168.10.3                  3d723636            ceecb168            265823              260781      up                  No Diagnostic                           vdl2                9                   Tunnel
    192.168.10.67                 192.168.10.3                  eb46069b            d932cc7d            265814              260713      up                  No Diagnostic                           vdl2                11                  Tunnel
    192.168.10.66                 192.168.10.3                  6385437d            f1fc09d0            265825              260662      up                  No Diagnostic                           vdl2                9                   Tunnel
    192.168.10.68                 192.168.10.3                  65247fe0            91492889            265794              260765      up                  No Diagnostic                           vdl2                9                   Tunnel
    Note: The sessions noted in this output are for a vmk# with IP address 192.168.10.3. In this example, vmk11 is the interface with the BFD_DOWN status and has IP address 192.168.10.9. This interface has no sessions shown here.

  • Testing communication over the vmk# in question shows that it is able to communicate with other TEP addresses in the cluster:
    vmkping -I vmk11 -S vxlan -d -s 1572 192.168.10.7
    PING 192.168.10.7 (192.168.10.7): 1572 data bytes
    1580 bytes from 192.168.10.7: icmp_seq=0 ttl=64 time=1.920 ms
    1580 bytes from 192.168.10.7: icmp_seq=1 ttl=64 time=0.741 ms
    1580 bytes from 192.168.10.7: icmp_seq=2 ttl=64 time=0.772 ms
  • Performing a packet capture on the suspect interface shows that no BFD traffic is being sent or received. A command similar to pktcap-uw --vmk vmk11 --dir 2 -o - | tcpdump-uw -nnevv -r - 'udp port 3784' run on the affected ESXi host shows no output.

Environment

VMware NSX

Cause

This issue will occur when there is a temporary interruption in BFD traffic between transport nodes. When the issue causing the disruption in BFD traffic is resolved, the status of the host's TEP is left in the "All bfd tunnels are down..." state indefinitely. VMs may be running on the host without issue as they are all using the other TEP on the host that is not experiencing the issue.

Resolution

This is a known issue affecting VMware NSX and vSphere ESX hosts prepared as NSX transport nodes. There is currently no resolution.

To workaround this issue, a VM must establish a new connection to an NSX-backed segment for its networking. This could be accomplished via one of the following options:

  • Migrate an existing VM that is using an NSX-backed segment for its networking to the host.
  • Create a dummy VM on the host, configure it to use an NSX-backed segment for its network, and power it on.
  • Power cycle an existing VM on the host that is using an NSX-backed segment for its networking.

Note: This may not immediately resolve the issue as the determination of which TEP interface to use is not guaranteed to use the affected TEP. In this scenario, repeat the process until affected TEP status is not BFD_DOWN.

Additional Information

Alternatively, the affected host could be placed into maintenance mode and rebooted.