Network connectivity issue after successful SGOS upgrade to 7.3.x

Products

ASG-S500 ProxySG Software - SGOS

Issue/Introduction

After upgrading Edge SWG (proxysg) internet traffic is failing.

NIC transmit queue is full

Environment

Release: 7.3.x

Resolution

Beginning from SGOS 6.7.4.1, there are 2 new features introduced called TSO (Transmit segment offload) and Hardware checksum offload (Transmit checksum). These features are enabled by default. More information on TSO can be found here and hardware checksum offload can be found in the resource doc. with the URL below.

https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_offload

While these features would help to improve the performance of the TCP\IP stack of SGOS by offloading these tasks to the NIC card (SG's Network Adaptor), In some deployments it has been observed that the NIC card's transmit (TX) queue gets full and packet gets dropped or not processed in a timely manner. In other words, the packet does not leave the SG/ASG. When this situation happens and packets like ARP request does not leave from SG's NIC, the device will lose connection to the default gateway. This will make the SG unreachable from outside the network and as a result it may appear to hang or unresponsive over the network but will respond via serial console. Without any change when downgrading back to the previous SGOS version, this problem would be resolved. Cold bootup would also appear to resolve the issue.

When Edge SWG (proxySG)/ASG has the following conditions true, it's more likely that the SG might encounter this problem

The device has an active 10G Fiber/copper NIC
Deployment with a high volume of intercepted and/or bypassed packets on that 10G NIC.

Note 1: if the Edge SWG (ProxySG)/ASG has more than one active interface other than the 10G interface (i.e int 0:0 as management interface), It would be reachable via that interface while this issue occurs.

Note 2: There are no logs (i.e sysinfo file/snapshot, eventlog) that would indicate this problem other than the full memory core. Full core needs to be obtained from the device when the device or the 10G NIC is in a hung or unresponsive state.

In the SGOS 7.3.7.1, we recommend implementing the CLI command set below, to disable these features.

#conf t
#(config)tcp-ip tcp-tso disable
#(config)tcp-ip transmit-checksum disable

Note 3 - While these features are disabled, these tasks are still being performed by SGOS TCP/IP stack instead of the proxySG/ASG's NIC.

Note 4 - These CLI commands are hidden CLI commands and will not be displayed under available CLI commands with '?' or on an attempt to auto-populate by pressing the tab key. When these changes are made, it is stored in SG's configuration permanently and preserved upon reboot or upgrade to higher SGOS versions.

Also, to prevent the ASG from responding slowly to user traffic, we recommend also disabling LRO by running the CLI command set below.

#conf t
#(config)tcp-ip tcp-lro disable