ARP_PROXY Shows NULL on LB Service Interfaces

search cancel

ARP_PROXY Shows NULL on LB Service Interfaces

book

Article ID: 314338

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The behavior identified in this article results in the Proxy_Arp table having it's entries removed affecting traffic destined to the LB VIP.
The purpose of this article is to document an edge case timing issue dependent upon multiple events happening at specific timing in an environment experiencing network latencies above the recommended maximum between NSX-T manager nodes.

Traffic being catered to by service interfaces like Load Balancer, NAT, etc is impacted. Load Balancer VIP is reporting status as "Down" and is not pingable, but the backend pool members are reachable.
When reviewing the service interface the VIP is bound to, VIP address is no longer displaying in the proxy_arp field of the Edge Service Router Service Interface.

Packet captures on the ESXi uplinks used by the datapath interfaces of the Edge VM show that replies are not being sent for the arp_proxy addresses.

Note: In the example below, 192.168.179.180 is my VIP and should be in the arp_proxy list.

[root@esx-02:~] pktcap-uw --uplink vmnic0 --capture UplinkSndKernel,UplinkRcvKernel --mac ##:##:##:##:##:6a --ethtype 0x0806 --ng -o - | tcpdump-uw -enr -
The name of the uplink is vmnic0.
The session capture point is UplinkSndKernel,UplinkRcvKernel.
The session filter mac(src or dst) is ##:##:##:##:##:6a.
The session filter Ethernet type is 0x0806.
The output file format is pcapng.
pktcap: The output file is -.
pktcap: No server port specifed, select 59368 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 59368.
pktcap: Main thread: 371054500736.
pktcap: Dump Thread: 371055036160.
pktcap: Recv Thread: 371055564544.reading from file -, link-type EN10MB (Ethernet)

pktcap: Accept...
pktcap: Vsock connection from port 1046 cid 2.
13:57:13.904505 ##:##:##:##:##:ce > ##:##:##:##:##:6a, ethertype 802.1Q (0x8100), length 64: vlan 132, p 0, ethertype ARP, Request who-has 192.168.132.1 tell 192.168.132.254, length 46
13:57:13.904910 ##:##:##:##:##:6a > ##:##:##:##:##:ce, ethertype 802.1Q (0x8100), length 64: vlan 132, p 0, ethertype ARP, Reply 192.168.132.1 is-at ##:##:##:##:##:6a, length 46
13:57:25.680277 ##:##:##:##:##:ce > ##:##:##:##:##:6a, ethertype 802.1Q (0x8100), length 64: vlan 134, p 0, ethertype ARP, Request who-has 192.168.179.180 tell 192.168.179.129, length 46
13:57:25.680748 ##:##:##:##:##:6a > ##:##:##:##:##:ce, ethertype 802.1Q (0x8100), length 64: vlan 134, p 0, ethertype ARP, Reply 192.168.179.180 is-at ##:##:##:##:##:6a, length 46  >>> This is working state. If this issue is realized, this arp_proxy reply will not be seen.
13:57:55.010416 ##:##:##:##:##:6a > ##:##:##:##:##:ce, ethertype 802.1Q (0x8100), length 64: vlan 134, p 0, ethertype ARP, Request who-has 192.168.179.129 (##:##:##:##:##:ce) tell 192.168.179.190, length 46
13:57:55.010764 ##:##:##:##:##:ce > ##:##:##:##:##:6a, ethertype 802.1Q (0x8100), length 64: vlan 134, p 0, ethertype ARP, Reply 192.168.179.129 is-at ##:##:##:##:##:ce, length 46

-----------------------------------------------------------

[root@esx-02:~] net-stats -l | grep ##:##:##:##:##:6a
67108878            5       9 DvsPortset-0     ##:##:##:##:##:6a  NSX-T Edge 01.eth2

In a live environment, the below can be checked to confirm sync issue between MP and CCP:

Check for arp entries in nestdb "vmware.nsx.nestdb.ArpTableConfigMsg" table on Edge Nodes by running the following command as root on the Edge Node:

/opt/vmware/nsx-nestdb/bin/nestdb-cli --cmd get vmware.nsx.nestdb.ArpTableConfigMsg --beautify --json
{
    "result" : [
        {
            "object_type" : "vmware.nsx.nestdb.ArpTableConfigMsg",
            "value" : {
                "entry" : [
                    {
                        "ip" : "192.168.179.180",
                        "mac" : "0",
                        "type" : "PROXY_ARP_ENTRY"
                    }
                ],
                "log_router_port_id" : "da73ef7a-ab81-####-####-####-########f3f",
                "id" : "a5845dcc-4076-####-####-####-########c48"
            }
        }
    ]
}

Capture mediator dump on manager and check if the arp entries are missing there.

set debug
get mediator dump <file_name>   >>> location to check for the dump after running is /var/vmware/file-store/<file_name>

To confirm that there is network latency between the NSX manager nodes and that Corfu layout is changing, please perform the following:

Track the network latency between the NSX manager nodes at the time of issue:

$ zgrep -h "ping-latency" /var/log/corfu/corfu-metrics.* | grep "2023-03-28" | grep -v "mean=[0-9]\." | grep -v "mean=[0-4][0-9]\." | sort --key=1 | sed 's/,/ /g' | sed 's/=/ /g' | awk '{print $1 ", " $3 ", " $15}'

2023-03-28T01:50:43.842Z, failure-detector_ping-latency, 296.977376 >>>>Latency here is more than 10 ms

Check the corfu layout change frequency in /config/corfu/. Ideally there should not be 3-4 layout changes per day. Multiple and frequent corfu layout changes point to underlying issues. You can monitor live with the following command.

watch -n 30 'ls -lathr /config/corfu/LAYOUT*'
---------------------------------------------
Every 30.0s: ls -lathr /config/corfu/LAYOUT*                                                                                                                                                                                                               nsx-mngr-01: Thu Apr  6 21:50:49 2023

-rw-r----- 1 corfu corfu 435 Mar 31 16:22 /config/corfu/LAYOUTS_0.ds
-rw-r----- 1 corfu corfu 435 Mar 31 17:58 /config/corfu/LAYOUT_CURRENT.ds  >>> Watch the timestamp for LAYOUT_CURRENT.ds when updated a layout change has occurred.
-rw-r----- 1 corfu corfu 435 Mar 31 17:58 /config/corfu/LAYOUTS_1.ds

Below is an example of tracing an instance of the issue by reviewing /var/log/proton/nsxapi.log in an NSX Manager node:

CCP full sync is initiated by FullSyncIdsLoader. The process begins with the ArpTableConfig table.

2023-03-28T14:44:51.595Z INFO FullSyncIdsLoader AbstractFullStateSyncDataBuilder 8386 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Providers [ArpTableConfig, BfdPeer, BgpNeighbor, BridgeCluster, ComputeCollectionProperties, Crl, DADProfile, DStaticRouteConfig, DhcpIpPool, DhcpRelay, DhcpStaticBinding, DhcpV6IpPool, DhcpV6StaticBinding, DiscoveredNodeParent, DnsForwarder, EVPNService, EVPNTunnelEndpoint, EastWestSIPolicy, EdgeCluster, ExcludeList, FQDNAnalysisConfig, FirewallSchedule, FirewallSection, FirewallUrlCategorizationConfig, GiNodeSolutionInfo, HostDiscoveredNodeRelationship, IDSComputeCollectionSettings, IDSProfile, IDSSection, IDSSignatures, IDSStandaloneHostsSettings, IPPrefixList, IPSecVPNDPDProfile, IPSecVPNIKEProfile, IPSecVPNLocalEndpoint, IPSecVPNPeerEndpoint, IPSecVPNService, IPSecVPNSession, IPSecVPNTunnelProfile, IPSet, IdfwEnabledComputeCollection, IdfwMasterSwitchSetting, IdfwStandaloneHostsSetting, InstanceRuntime, InterSiteL2Forwarder, IpfixCollectorConfig, IpfixConfigModel, IpfixSwitchGlobalConfigModel, L2VpnService, L2VpnSession, LOG_ROUTER, LOG_ROUTER_PORT, LOG_SERVICE_ROUTER_CLUSTER, LRQoSProfile, Layer3SIPolicy, LbRule, LbTcpProfile, LoadBalancerApplicationProfile, LoadBalancerMonitor, LoadBalancerPersistenceProfile, LoadBalancerPool, LoadBalancerService, LoadBalancerSslProfile, LoadBalancerVirtualServer, LogicalDhcpServer, LogicalPort, LogicalRouter, LogicalSwitch, MACSet, MetadataProxy, MplsConfig, MulticastConfig, NDRAProfile, NSGroup, NSGroupMembershipStatusEvent, NSProfile, NorthSouthSIPolicy, NsxGlobalConfigEntity, OspfInterfaceConfig, PBRRule, PBRSection, PortMirroringSession, RealizationStateBarrier, RouteMap, RoutingBfdConfig, SIExcludeList, ServiceChain, ServiceConfig, ServiceDefinition, ServiceInsertionRule, ServiceInsertionSection, ServiceProfile, TransportNodeState, UpmProfile]
2023-03-28T14:44:51.595Z INFO FullSyncIdsLoader AbstractFullStateSyncDataBuilder 8386 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Loading IDs from ArpTableConfig

Corfu cluster instability for about ~2 minutes, this read initiated by FullSyncIdsLoader to pull the data from Corfu ArpTableConfig table does not occur right away

2023-03-28T14:44:52.103Z WARN FullSyncIdsLoader AbstractView 8386 Got a wrong epoch exception, updating epoch to 4020 and invalidate view
2023-03-28T14:44:52.103Z INFO FullSyncIdsLoader AbstractView 8386 layoutHelper: Retried 0 times, SystemDownHandlerTriggerLimit = 60
2023-03-28T14:44:53.108Z WARN FullSyncIdsLoader AbstractView 8386 Got a wrong epoch exception, updating epoch to 4020 and invalidate view
2023-03-28T14:44:53.108Z INFO FullSyncIdsLoader AbstractView 8386 layoutHelper: Retried 1 times, SystemDownHandlerTriggerLimit = 60
***
2023-03-28T14:46:09.757Z WARN FullSyncIdsLoader AbstractView 8386 Server still not ready. Waiting for server to start accepting requests.
2023-03-28T14:46:09.757Z INFO FullSyncIdsLoader AbstractView 8386 layoutHelper: Retried 0 times, SystemDownHandlerTriggerLimit = 60

When the read occurs on the client, we see a TrimmedException on the Corfu stream associated with the ArpTableConfig table. This alone is *NOT* an issue and happens when the table is not accessed for some time. However, when this occurs, the in-memory view of this table is reset to empty, as a new checkpoint entry needs to be loaded.

2023-03-28T14:46:11.105Z INFO FullSyncIdsLoader CorfuCompileProxy 8386 accessInner: Encountered trimmed address space while accessing version 1292775699 of stream f4d4cd92-067f-####-####-####-########067 on attempt 0

StateSync interrupts the thread in the middle of this process. When this occurs abruptly, Corfu client throws a UnrecoverableCorfuInterruptedError.

2023-03-28T14:46:11.166Z INFO StateSyncShardingCoordinator-leadership-executor AbstractFullStateSyncDataBuilder 8386 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Stopping FullStateSyncDataBuilder

INFO | jvm 1 | 2023/03/28 14:46:14 | Exception in thread "FullSyncIdsLoader"
INFO | jvm 1 | 2023/03/28 14:46:14 | org.corfudb.runtime.exceptions.unrecoverable.UnrecoverableCorfuInterruptedError: Interrupted while completing future

StateSync attempts to remediate the issue by starting the thread back up, which is a violation of the contract of an unrecoverable error. At this point, the stream layer for this table is in an inconsistent state, and database operations should not be performed by the client without a proper teardown/start up process when unrecoverable errors occur.

2023-03-28T14:46:45.923Z INFO StateSyncShardingCoordinator-leadership-executor AbstractFullStateSyncDataBuilder 8386 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Activating FullStateSyncDataBuilder

Further we see delete operation for the entries in question as nothing was loaded from corfu table:

2023-03-28T09:03:11.225Z INFO MessageProcessor FullSyncTransactionCreator 11137 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="adapter-mp"] Full sync entity update for ARP_TABLE_CONFIG: Added 0, Deleted 0, Updated 6
2023-03-28T14:47:41.368Z INFO MessageProcessor FullSyncTransactionCreator 11137 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="adapter-mp"] Full sync entity update for ARP_TABLE_CONFIG: Added 0, Deleted 6, Updated 0
2023-03-28T14:47:00.463Z WARN MessageProcessor ConfigStoreEntityTypes 771 - [nsx@6876 comp="nsx-controller" level="WARNING" subcomp="adapter-mp"] Invalid NsxConfigMsg processed. Entity type = ARP_TABLE_CONFIG, field number = 54,
NsxConfigMsg = operation: DELETE
entity_id {
  ########-####-####-####-########a025
}
entity_type: ARP_TABLE_CONFIG

Environment

VMware NSX-T Data Center

Cause

The root cause of this issue is consistent Network latency more than NSX-T system requirement i.e. < 10ms.
High network latency between NSX-T Manager nodes can result in frequent cluster reconfiguration events within CorfuDB as evidenced by observed Corfu Layout changes.

To document the state in which it was observed that the ArpTableConfig was read as empty see the following points:

In certain instances, if the Corfu cluster is in a period of instability due to layout changes at the same time that a FullSync has been requested by the Central Control Plane (CCP), the read from the database may not occur immediately.
When the read does occur, a TrimmedException may be experienced on the Corfu stream--this in itself is not an issue and happens when the table is not accessed for some time. However, when this occurs, the in-memory view of this table is reset to empty, as a new checkpoint entry needs to be loaded.
In this particular instance, the ArpTableConfig table is the first to be read and the thread is interrupted by StateSync in the middle of this process resulting in an unrecoverable error being thrown.
StateSync attempts to remediate the issue by starting the thread back up, which is a violation of the contract of an unrecoverable error. At this point, the stream layer for this table is in an inconsistent state, and database operations should not be performed by the client without a proper teardown/start up process when unrecoverable errors occur.
When the full sync is reattempted, 0 entries are observed (because the object was reset as mentioned in point 2., and the updating of the new entries did not complete).
It was observed in the particular customer case where this behavior was identified that the next access on this table was not performed until the next day. During this time, several database compaction runs occur and populate new checkpoint entries. When the table is accessed once more, during a manual full sync attempt, it hits another expected Exception (as mentioned in point 2.). This reset the in-memory view of the object once more, and a new entry from scratch was loaded. From this point on, the in-memory view of the table reflected the correct state, and all Arp_Proxy entries were once again visible.

Note: Please note that even though there were latencies and full syncs previously in the environment this was observed in, the impact of the table being read as empty is an edge case timing issue that was experienced.

In version 3.2.x, with architecture improvements, the StateSync component was no longer needed and hence, was removed from the product. As such, this scenario is not applicable to NSX-T 3.2.x and above releases.

Resolution

In NSX-T 3.1.x and below, ensure that NSX-T Manager nodes adhere to the system requirements of less than 10ms of network latency between nodes.

This issue is resolved in NSX-T 3.2.x and above releases.

Workaround:

Initiate a forcesync on all 3 MP nodes

set debug
get mediator forcesync

Check mediator dump to see if the arp entries are available.

set debug
get mediator dump <file_name>   >>> location to check for the dump /var/vmware/file-store/<file_name>

Feedback

thumb_up Yes

thumb_down No