No route on the active DLR Edge after the HA Failover
book
Article ID: 327332
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms: Outage experience due to default route missing on the DLR after HA event
msr logs below and vmci channel flaps # vmci channel flaps: ESXi 139 T21:03:46.994Z [ 7BBE700 error ] recv error: 0:Success T21:03:46.994Z [ 7BBE700 info ] Vdrb: vmci link down, fd = 27 ESXi 138 T21:03:55.064Z [ D6B82700 info ] Vdrb: vmci link up, fd = 24 T21:03:55.064Z [ D6B82700 info ] Sent edge link up to kernel
# routing socket errors: **** PROBLEM 0x0309 - 6 (0000) **** I:00002157 F:00000001 i3lx.c 421 :at 01:10:10, 22 November 2021 (517481444 ms) Interface Information stub failed to process a routing message because a recv() call on a routing socket failed. LSR Index = 1 Recv errno = 88 **** PROBLEM 0x0309 - 6 (0000) **** I:00002465 F:00000001 i3lx.c 421 :at 02:06:12, 22 November 2021 (520843804 ms) Interface Information stub failed to process a routing message because a recv() call on a routing socket failed. LSR Index = 1 Recv errno
Cause
At this moment, the best information we have is that control VM was not able to read the netlink messages for a few hours. And if no changes were made to the system to lead to this. It is possibly the system built up to this situation over a period of time.
Resolution
Workaround: When problem occurs and before applying the workaround: Collect the ipstrc.log: In CLI, run “debug routing” as admin Wait for about a minute Collect /var/log/msr/ipstrc.log Disable log collection “no debug routing”
WORKAROUND 1) One more config push from Management plane(like toggle BGP enable/disable) 2) Reboot 3) Do the failover.