Intermittent Connectivity Issues Observed for VMs Attached to NSX Overlay Segments
search cancel

Intermittent Connectivity Issues Observed for VMs Attached to NSX Overlay Segments

book

Article ID: 406572

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • VMs attached to NSX overlay segments may experience connectivity issues after being vMotioned.
  • Incorrect MAC/ARP updates from cfgAgent may be observed in host logs /var/run/log/nsx-syslog.log
    cfgAgent[2103498]: NSX 2103498 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="9BFB9700" level="info"] VXLAN Message         ARP (v4) Update: len = 56       SwitchID:0, VNI:69633   Num of entries: 1               #0      VM IP:192.168.##.##    VM MAC:00:50:56:##:##:aa       VTEP address type: 1    VTEP IP:10.##.##.#1     VTEP IPv6:0000:0000:0000:0000:0000:0000:0000:0000   VTEP MAC:00:50:56:##:##:cc     --->     INCORRECT TEP
    2025-07-04T14:22:26.251Z cfgAgent[2103498]: NSX 2103498 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="9BFB9700" level="info"] VXLAN Message         ARP (v4) Update: len = 56       SwitchID:0, VNI:69633   Num of entries: 1               #0      VM IP:192.168.##.##    VM MAC:00:50:56:##:##:aa       VTEP address type: 1    VTEP IP:10.##.##.#2     VTEP IPv6:0000:0000:0000:0000:0000:0000:0000:0000   VTEP MAC:00:50:56:##:##:bb     --->     CORRECT TEP
    
    cfgAgent[2103498]: NSX 2103498 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="9BFB9700" level="info"] VXLAN Message         VM MAC Update: len = 54         SwitchID:0, VNI:69633   Num of removed entries: 0       Num of added entries: 1                 #0      VM MAC:00:50:56:##:##:aa       VTEP address type: 1    VTEP IP:10.##.##.#1     VTEP IPv6:0000:0000:0000:0000:0000:0000:0000:0000   VTEP MAC:00:50:56:##:##:cc     --->     INCORRECT TEP
    cfgAgent[2103498]: NSX 2103498 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="9BFB9700" level="info"] VXLAN Message         VM MAC Update: len = 54         SwitchID:0, VNI:69633   Num of removed entries: 0       Num of added entries: 1                 #0      VM MAC:00:50:56:##:##:aa       VTEP address type: 1    VTEP IP:10.##.##.#2     VTEP IPv6:0000:0000:0000:0000:0000:0000:0000:0000   VTEP MAC:00:50:56:##:##:bb     --->     CORRECT TEP
  • NSX-RPC keepalive failure entries are observed in host logs /var/run/log/nsx-syslog.log
    nsx-opsagent[2104023]: NSX 2104023 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2104218" level="ERROR" errorCode="RPC31"] RpcConnection[41561 Negotiating to tcp://127.0.0.1:4554 0] Keepalive failed - haven't received response in time (last request was sent 60 seconds ago, response received - never)
    cfgAgent[2103498]: NSX 2103498 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" s2comp="nsx-rpc" tid="9B19D700" level="error" errorCode="RPC31"] RpcConnection[68925 Negotiating to tcp://127.0.0.1:4554 0] Keepalive failed - haven't received response in time (last request was sent 59 seconds ago, response received - never)
    nsx-proxy[2103518]: NSX 2103518 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2103684" level="ERROR" errorCode="RPC31"] RpcConnection[107939 Negotiating to tcp://127.0.0.1:4554 0] Keepalive failed - haven't received response in time (last request was sent 60 seconds ago, response received - never)
    nsx-opsagent[2103194]: NSX 2103194 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2103265" level="ERROR" errorCode="RPC31"] RpcConnection[241 Connected on tcp://127.0.0.1:4554 0] Keepalive failed - haven't received response in time (last request was sent 59 seconds ago, response received - 299 seconds ago)

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX

Cause

vMotion can cause a duplicate MAC/IP entry in cfgAgent's LogSwitchStateMsg state in nestDB when NSX transport nodes experience frequent NSX-RPC keepalive failures. This behaviour results in VMs losing network connectivity.

Resolution

This issue is resolved in VMware NSX 4.2.3 and VCF 9.0.1 and later, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.



Workaround

  • If LogSwitchStateMsg has stale forwarding entries, restart the cfgAgent.

    Command to check LogSwitchStateMsg in nestDB:
    /opt/vmware/nsx-nestdb/bin/nestdb-cli --cmd get vmware.nsx.nestdb.LogSwitchStateMsg --beautify --json
    example:
            {
                "object_type" : "vmware.nsx.nestdb.LogSwitchStateMsg",
                "value" : {
                    "id" : "b8c9####-0c14-####-9a4b-5736####fb95",
                    "vtep" : [
                        {
                            "vtep_ip" : "10.##.##.#1",
                            "vtep_label" : {
                                "label" : 69633
                            },
                            "segment_id" : "10.####",
                            "vtep_mac" : "00:50:56:##:##:cc"
                        }
                    ],
                    "mac" : [
                        {
                            "mac" : "00:50:56:##:##:aa",----->mac
                            "vtep_ip" : "10.##.##.#1",------->vtep should be actual destination TEP
                            "vtep_mac" : "00:50:56:##:##:cc"
                        }
                    ]
                }
            }

    Command to perform cfgAgent restart:
    /etc/init.d/nsx-cfgagent restart

  • To help prevent an issue reoccurance, recommendation is to disable excessive DFW logging which is a known cause for NSX-RPC keepalive failures on NSX transport nodes.