Network instability in a lifecycle of the Pod with PSQL service

Products

VMware NSX

Issue/Introduction

The pod is a standalone pod (ie: not part of any replicaset), or a member of non-standard K8S replica sets such as StrimziSets.
The pod is deleted and recreated. The new pod is scheduled on the same worker node as the old one.

Ncp will log a message with the port info for the new pod:

[nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="INFO"] nsx_ujo.ncp.nsx.manager.container_service Created port <port_id> with ip <PORT_IP> and mac <PORT_MAC> for container <K8S_POD_ID>

After receiving the CNI ADD message, nsx-node-agent will log one or more messages like:

[nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Skip an exsiting CIF config for container nsx.<namespace>.<pod_name> until backoff expires. Last used at <timestamp>

nsx-node-agent eventually returns CNI response with incorrect IP/MAC address will log the following one:

[nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Sent network configuration back to CNI for container <namespace>.<pod_name>: {'return_code': '200', 'return_status': 'OK', 'ip_address': <WRONG_IP>, 'gateway_ip': '<GW_IP>', 'mac_address': <WRONG_MAC>, 'vlan_id': <WRONG_VLAN_ID>}

It will appear value WRONG_IP != PORT_IP

A few seconds later nsx-node-agent receive a hyperbus update with the "correct" IP address.

The message will look like the following one:

[nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.hyperbus_service Put app_id nsx.<namespace>.<pod_name>% with IP <PORT_IP>, MAC <PORT_MAC>, gateway <GW_IP>, vlan <VLAN_ID> ,CIF <PORT_ATTACHMENT_ID>, wait_for_sync False into queue for hyperbus ADD,current size: <INT_VALUE>

Environment

NSX 4.2.X.X

NCP 4.X

Cause

The nsx-node-agent configures the pod network interface with a wrong IP address. In such case the pod will NOT be able to send/receive ANY traffic.

This can happen only for standalone pods and member of replicasets such as StrimziSet, and only if:

The pod is deleted/recreated
Both the "old" and "new" pod are schedule to the same host
There is a delay of more than 15 seconds between the CNI ADD message from kubelet and the hyperbus update message from the ESX host

If the above conditions are met, nsx-node-agent will use a stale cache entry to configure the pod network interface, which will have an IP and MAC different from the expected ones. As a consequence, all the traffic originated from this pod will be dropped by the NSX spoofguard filter.

StatefulSet members are not affected by this issue. For these kind of pods, the nsx-node-agent will always verify the pod identity with the K8S API server before configuring the pod's network interface.

Resolution

Check the file below file for default it should be "config_reuse_backoff_time = 15" needs to have "config_reuse_backoff_time = 30" in place

cat /var/vcap/jobs/nsx-node-agent/config/ncp.ini
[DEFAULT]
use_stderr = False

[coe]
connect_retry_timeout = 30

[nsx_node_agent]
config_reuse_backoff_time = 15
proc_mount_path_prefix = ''

To modify this value we can check the resolution at the Tanzu KB 413738 below:

Statefulset pods or other types that reuse same pod name during lifecycle of the pod can potentially lose network connectivity