Edge stops forwarding network traffic after Socket write error
book
Article ID: 312618
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
Issue seen on ESG (Edge Services Gateway) 6.4.6 or later.
All dynamic routing across ESG fails.
DFW rules are pushed to ESG.
IPsec service is enabled.
You may seen entries similar to the below in the in the Edge logs:
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.debug] DEV 0x0303-86 (0000): Packet received by socket (protocol = 0X00000059)
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 bfd[]: [NSIPROD]: [daemon.err] ovs|00007|bfd_io|ERR|socket send failed: No buffer space available
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.err] EXCEPTION 0x3e02-22 (0000): OSPF 1 NM ignored a reported socket error with code 1.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing: [daemon.err] EXCEPTION 0x3e02-22 (0000): OSPF 1 NM ignored a reported socket error with code 1.
Environment
VMware NSX for vSphere 6.4.x
Cause
As per ESG design, conntrack revalidation happens while there is a configuration change and after certain intervals. While revalidation happens when there is a high amount of IPsec traffic, socket write errors are seen from dcsms and bfd processes, causing all neighbourship to go down in dynamic routing.
Resolution
This issue is resolved in VMware NSX for vSphere 6.4.11, available at VMware Downloads.
Workaround:
Failover Edge from Active node to standby node when the issue happens