openssh client (rsync/scp) hangs on NSX-T linux/windows bare metal server
search cancel

openssh client (rsync/scp) hangs on NSX-T linux/windows bare metal server

book

Article ID: 322494

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • rsync/scp hangs, for example when using scp or rsync, the command hangs and never completes.
  • The NSX-T bare metal server fails to communicate with none NSX-T servers.
  • None NSX-T to none NSX-T bare metal server communication , works fine.
  • This impacts NSX versions 3.2.X.


Environment

VMware NSX-T Data Center

Cause

As per the openssh specifications, openssh will automatically change the TOS/DSCP field of IP header.
In the case of rsync, the TOS 0x08 will be selected by openssh.
TOS 0x08 is equal to DSCP 2.
When DSCP 2 is configured under ops-global-config it will cause the rsync/ssh sessions to hang.
You can use the following API to view the configuration details:
GET /policy/api/v1/infra/ops-global-config
{
    "in_band_network_telementry": { 
        "dscp_value": 2, 
        "indicator_type": "DSCP_VALUE"
    },
    [snip]
}
Note:
TOS is Type of Service.
DSCP is Differentiated Services Code Point.

Resolution

This is a known issue impacting NSX-T Data center and VMware NSX.

Workaround:
INT may not be necessary and can be disabled
The ops-global-config is specific to `VLAN Traceflow`, and the `VLAN Traceflow` is applied only to packets injected into VLAN backed port on ESXi. Enabling or disabling ops-global-config won't affect the behavior of `Overlay Traceflow`. Therefore, it does not make sense to enable ops-global-config on the NSX Management Plane if there is no ESXi type transport node on the setup (i.e. only linux/windows bare metal servers are used) . In this case, if it is not intend to use VLAN Traceflow on ESXi hosts, disabling ops global config won't have any negative impact.


If INT is not needed it can disable with the following steps:
1. Invoke GET /policy/api/v1/infra/ops-global-config.
2. Copy the response body from the above API.
3. Remove the 'in_band_network_telementry' field from the response body.
3. Return the edited response body with the following API PUT /policy/api/v1/infra/ops-global-config.



If INT is necessary the following is an example of how to find a suitable DSCP value if using VLAN Traceflow.
Assume we are using openssh client 7.4 (openssh-clients-7.4p1-21.el7.x86_64). Per the openssh specs, "versions 7.7 and earlier will set it per rfc1349 unless otherwise specified.
Therefore, the TSO bits are in bit 1 ~ bit 4, while the DSCP bit is in bit 2 to bit 7 (based on rfc2474).
Configure a DSCP value which bit 5 ~ bit 7 should be set (the bits do not overlap with TOS).
In this way, the DSCP value used by NSX won't conflict with the TOS value used by openssh.
An example DSCP value could be 111001b (DSCP value 57).

INT refers to In-band Network Telemetry.