Edge stops forwarding network traffic after Socket write error
search cancel

Edge stops forwarding network traffic after Socket write error

book

Article ID: 312618

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • Issue seen on ESG (Edge Services Gateway) 6.4.6 or later.
  • All dynamic routing across ESG fails.
  • DFW rules are pushed to ESG.
  • IPsec service is enabled.
  • You may seen entries similar to the below in the in the  Edge logs:
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.debug] DEV 0x0303-86 (0000): Packet received by socket (protocol = 0X00000059)
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 bfd[]: [NSIPROD]:  [daemon.err] ovs|00007|bfd_io|ERR|socket send failed: No buffer space available
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.err] PROBLEM 0x0303-46 (0000): Socket write error.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.err] EXCEPTION 0x3e02-22 (0000): OSPF 1 NM ignored a reported socket error with code 1.
2020-11-10T09:29:00+00:00 CNSVAD0PR1NXE01-0 routing:  [daemon.err] EXCEPTION 0x3e02-22 (0000): OSPF 1 NM ignored a reported socket error with code 1.


Environment

VMware NSX for vSphere 6.4.x

Cause

As per ESG design, conntrack revalidation happens while there is a configuration change and after certain intervals. While revalidation happens when there is a high amount of IPsec traffic, socket write errors are seen from dcsms and bfd processes, causing all neighbourship to go down in dynamic routing.

Resolution

This issue is resolved in VMware NSX for vSphere 6.4.11, available at VMware Downloads.

Workaround:
  • Failover Edge from Active node to standby node when the issue happens