This article intends to provide insight on how to analyze ARP information, states and general commands to use for this purpose. While troubleshooting network problems, ARP interpretation is key. However, due to changes and product specific enhancements, interpreting ARP entries in Edges can mislead engineers to wrong conclusions.
VMware VeloCloud SD-WAN supported versions
When the SD-WAN edge data plane software is running (edged), it reads and writes directly from the raw socket and does not rely on Linux networking stack. There may be times that you see difference in what arp -a and debug.py --arp return, this is an expected behavior. When theSD-WAN edge software is running, engineer needs to rely on the output of debug.py and not the underlying Linux commands.
Another alternative is using debug.py --verbose_arp, which will show the same information of its counterpart debug.py --arp but in another format.
In current supported VMware VeloCloud SD-WAN versions, ARP state is only updated by the timer which now runs every minute. Additional four new states were added as well to improve its functionality:
1. INCOMPLETE: ARP has never worked. Previously this was DEAD state with full zeros MAC Address.
2. ALIVE: Valid response in the past two minutes.
3. REFRESH: Valid response in the past 15 minutes. User packets will cause ARP request to refresh, limited to one refresh per minute.
4. DEAD: ARP previously worked but the Edge had not received a response for more than 15 minutes.
This is how a working SD-WAN edge shows the debug.py --arp command output:
edge# debug.py --arp Interface Address C-Tag Flags Mac S-Tag Source Mac State IsArp_failure_event_sent Arp Retry count Refcnt GE1 172.26.241.1 0 0 74:4d:28:4a:7c:ba 0 f0:8e:db:0e:aa:c0 DEAD 0 0 1 GE1 172.26.241.51 0 0 e8:9f:ec:1c:b2:2c 0 f0:8e:db:0e:aa:c0 DEAD 0 0 1 GE1 172.26.241.183 0 0 e8:9f:ec:1c:af:c4 0 f0:8e:db:0e:aa:c0 DEAD 0 0 1 GE1 172.26.241.129 0 0 74:4d:28:4a:7c:ba 0 f0:8e:db:0e:aa:c0 ALIVE 0 0 1 GE1 172.26.241.173 0 0 e8:9f:ec:1c:b0:74 0 f0:8e:db:0e:aa:c0 DEAD 0 0 1 GE2 10.240.224.1 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 REFRESH 0 0 1 GE2 XXX.XX.XX.XXX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 XXX.XXX.XXX.XXX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 XXX.XXX.XX.XXX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 XXX.XXX.XXX.XXX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 10.240.0.1 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 XXX.XXX.XX.XX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 10.239.0.1 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 10.241.0.1 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 XXX.XXX.XXX.XX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 GE2 XXX.XXX.XXX.XXX 0 0 00:01:5c:b4:9b:e4 0 f0:8e:db:0e:aa:c1 ALIVE 0 0 1 LAN-VLAN37 192.168.37.167 0 0 a8:6d:aa:c9:e4:01 0 04:f0:21:4c:99:ed ALIVE 0 0 1 LAN-VLAN37 192.168.37.23 0 0 a8:66:7f:30:bc:ba 0 04:f0:21:4c:99:ed ALIVE 0 0 1 LAN-VLAN36 192.168.36.110 0 0 38:53:9c:9f:d5:5b 0 06:f0:21:4c:99:ed ALIVE 0 0 1 LAN-VLAN36 192.168.36.119 0 0 f0:18:98:41:d1:81 0 06:f0:21:4c:99:ed DEAD 0 0 1 LAN-VLAN36 192.168.36.228 0 0 38:53:9c:94:df:42 0 06:f0:21:4c:99:ed ALIVE 0 0 1 LAN-VLAN36 192.168.36.168 0 0 cc:d2:81:5c:5d:12 0 06:f0:21:4c:99:ed ALIVE 0 0 1 edge#
For demonstration purposes, this is the output of arp -a, linux command of the same device at the same moment:
edge# arp -a PC.lan (192.168.37.23) at a8:66:7f:30:bc:ba [ether] on br-network37 Phone.lan (192.168.36.110) at 38:53:9c:9f:d5:5b [ether] on br-network36 ? (172.26.241.129) at 74:4d:28:4a:7c:ba [ether] on ge1 Phone.lan (192.168.36.228) at 38:53:9c:94:df:42 [ether] on br-network36 PC.lan (192.168.36.119) at f0:18:98:41:d1:81 [ether] on br-network36 PC.lan (192.168.37.167) at a8:6d:aa:c9:e4:01 [ether] on br-network37 ? (10.240.0.1) at 00:01:5c:b4:9b:e4 [ether] on ge2 TV.lan (192.168.36.168) at cc:d2:81:5c:5d:12 [ether] on br-network36 edge#
Which is evidence enough of this not being the right command to use.