hub - sockWrite Failed for select in EWOULDBLOCK check (11)
search cancel

hub - sockWrite Failed for select in EWOULDBLOCK check (11)

book

Article ID: 190056

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

  • Remote Hub appears unstable - flipping from green to red and back again in Infrastructure Manager or Admin Console
  • error seen in hub.log:

    hub: sockWrite Failed for select in EWOULDBLOCK check (11)

Environment

  • DX UIM - Any Version
  • Component: UIM - HUB - any version

Cause

This error indicates that an application is attempting to send data faster than the network interface can process it, causing the socket buffer to overflow.   EWOULDBLOCK means that the socket send buffer is full when sending, or that the socket receive buffer is empty when receiving.

In general, this type of network error/issue must be worked in conjunction with a network administrator.

Common causes of this may include:

Buffer Exhaustion: Features like TSO (TCP Segmentation Offload) allow the OS to send massive data chunks to the NIC. If the NIC cannot "slice" these into standard packets quickly enough, the outgoing buffer fills up, and the OS returns EWOULDBLOCK.

Packet Distortion (GRO): Generic Receive Offload merges incoming packets. In some instances, this can confuse the application's perception of the network state or lead to checksum errors that cause the connection to hang.

Virtualization Overhead: In environments like VMware or KVM, "hardware" offloading is often emulated. This emulation can introduce micro-latencies that cause the socket write to fail during high-concurrency operations.

In a similar case where the same sockWrite error was occurring, it was determined that the errors/performance issues seemed to stem from unexpected bandwidth issues/limitations.

This error is a rare occurrence and we have seen only a few cases with the hub/robot throwing that error and it has always been related to an external network problem causing limited bandwidth availability.

 

Resolution

To determine if offloading is the culprit, you can temporarily disable these features using the ethtool utility. Replace [interface] with your actual device name (e.g., eno1).

sudo ethtool -K [interface] tso off gso off gro off

Note that this command switch is an uppercase K which is used to change settings.

To view the current settings for the interface, use lowercase k which is used to query the current settings:


ethtool -k [interface]



command output example:
# ethtool -k eno1 | grep segmentation

tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
        tx-tcp-mangleid-segmentation: off
generic-segmentation-offload: on
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]


If this resolves the issue, you should work with your network administrator to determine whether it is appropriate to leave this change in place.

Additional Information

Here is a breakdown of what the ethtool command given above is doing:

tso off:

Disable TCP Segmentation Offload.    When enabled, the CPU hands off large chunks of data to the NIC, which breaks them into smaller packets.    If disabled, the CPU does the work. Useful for troubleshooting "jumbo frame" errors.


gso off: 

Disable Generic Segmentation Offload    A software-based fallback for TSO that, when enabled, delays packet splitting as long as possible to save CPU cycles.    Disabling ensures the OS handles packet sizing strictly before hitting the driver.


gro off    

Disable Generic Receive Offload    The NIC combines small incoming packets into one large "super-packet" before giving it to the OS.    

 

It may be helpful to disable these one at a time to narrow down which behavior is problematic in your environment.