Statefulset pods or other types that reuse same pod name during lifecycle of the pod can potentially lose network connectivity
search cancel

Statefulset pods or other types that reuse same pod name during lifecycle of the pod can potentially lose network connectivity

book

Article ID: 413738

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Reproduction steps:

deployed postgres operator and deployed a cluster with one instance 

Then  tested kubectl delete pod <PODNAME> 

In another tab  watching for IP changes during restarted process  kubectl get po -owide -w 

It is visible that the pod IP after recreation is the Old IP instead of receiving new IP.

Testing from the postgress pod to confirm if pod is able to connect to the kubernetes service IP result in error no route to host

 

kubectl exec -it cluster-example-1 -- bash -c 'timeout 3 bash -c "echo > /dev/tcp/10.xxx.xxx.1/443" && echo Open || echo Closed'
kubectl exec -it cluster-example-1 -- bash -c 'timeout 3 bash -c "echo > /dev/tcp/10.xxx.xxx.1/443" && echo Open || echo Closed'
Defaulted container "postgres" out of: postgres, bootstrap-controller (init), plugin-barman-cloud (init)
Closed
kubectl exec -it cluster-example-1 -- bash -c 'timeout 3 bash -c "echo > /dev/tcp/10.xxx.xxx.1/443" && echo Open || echo Closed'
Defaulted container "postgres" out of: postgres, bootstrap-controller (init), plugin-barman-cloud (init)
bash: connect: No route to host
bash: line 1: /dev/tcp/10.xxx.xxx.1/443: No route to host
Closed

Environment

TKGi 1.2x

Cause

Issue: nsx-node-agent configures the pod network interface with a wrong IP address. In such case the pod will never be able to send/receive traffic.

This is not common scenario and alternatively similar situation can happen if NCP service is not running, Such a delay in receiving the new configuration is indication of a slowness either on the ESXi host or NSX services. 

This can happen only for statefulset members and standalone pods, and only if below conditions are met:

  1. the pod is deleted/recreated,
  2. both the "old" and "new" pod are schedule to the same host,
  3. there is a delay of more than 15 seconds between the CNI ADD message from kubelet and the hyperbus update message from the ESX host

Confirmation that NCP was running during the pod deletion process is important validation point. In order to confirm there is some sort of delay for the new IP configuration to arrive to the nsx-node-agent is observed with following lines from nsx-node-agent logs:

2025-10-01T10:47:15.592Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Mark cni_delete_timestamp 1759315635.5921113 for CIF ContainerNetworkInfo('11.xxx.xxx.4/24', '11.xxx.xxx.1', '04:50:56:xx:xx:18', 8, '8ab27b55-a8bc-48ea-aa50-1e432b3c7282')

2025-10-01T10:47:25.132Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Received CNI request message: {"version": "2.0.0", "config": {"netns_path": "/var/run/netns/cni-4dddb7ab-xxx-xxx-xxx-9da8837ed6aa", "container_id": "a2f095eb1253dfd42bc655d1c8f710dc486732ebfceafdb81a9bdee5b27e4a71", "dev": "eth0", "mtu": null, "container_key": "nsx.cnpg-system.cloudnative-pg-cluster-4-010-1", "dns": null, "runtime_config": {}}, "op": "ADD"}
2025-10-01T10:47:25.598Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Skip an exsiting CIF config for container nsx.example-system.cloudnative-pg-cluster-example-1 until backoff expires. Last used at 1759315635.5921113
2025-10-01T10:47:26.599Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Skip an exsiting CIF config for container nsx.example-system.cloudnative-pg-cluster-example-1 until backoff expires. Last used at 1759315635.5921113
2025-10-01T10:47:27.599Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Skip an exsiting CIF config for container nsx.example-system.cloudnative-pg-cluster-example-1 until backoff expires. Last used at 1759315635.5921113
2025-10-01T10:47:28.600Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Skip an exsiting CIF config for container nsx.example-system.cloudnative-pg-cluster-example-1 until backoff expires. Last used at 1759315635.5921113
2025-10-01T10:47:29.600Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Skip an exsiting CIF config for container nsx.example-system.cloudnative-pg-cluster-example-1 until backoff expires. Last used at 1759315635.5921113
2025-10-01T10:47:30.620Z 01738526-xxxx-xxxx-xxxx-8d97ccce00f1 NSX 1019181 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher_lin Adding container nsx.example-system.cloudnative-pg-cluster-example-1 in namespace /var/run/netns/cni-4dddb7ab-xxx-xxx-xxx-9da8837ed6aa (IP: 11.xxx.xxx.4/24, MAC: 04:50:56:xx:xx:18, gateway: 11.xxx.xxx.1, VLAN: 8, dev: eth0)

From the above snippet it is visible that the Old IP was marked for deletion but due to 15 sec timeout and no New IP received the nsx-node-agent reuses the Old IP 

Alternatively in case this problem was observed earlier another log sequence can be observed indicating that the above log snippet already happened and the timer have expired prior.This message indicates there is already a mismatch between the hyperbus cache and the OVS port for the container:

2025-10-14T12:12:53.615Z 1d56895e-676d-43ed-a901-2f742e4f47b1 NSX 3111098 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Checking if CNI port ContainerNetworkInfo('11.xx.xx.14/24', None, '04:50:xx:xx:xx:79', 8, '6c87f7f0-41fc-4304-a416-4d07d3ae3671') match cache port ContainerNetworkInfo('11.xx.xx.20/24', '11.xx.xx.1', '04:50:xx:xx:xx:87', 9, 'd35a55ff-7d7d-4b22-9a46-4f58834167e0')

2025-10-14T12:12:53.615Z 1d56895e-676d-43ed-a901-2f742e4f47b1 NSX 3111098 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cache Value mismatch, key: attachment_id, v1 :6c87f7f0-41fc-4304-a416-4d07d3ae3671, v2: d35a55ff-7d7d-4b22-9a46-4f58834167e0

The message is not followed by until backoff expires. however the DEL and ADD hyperbus messages trigger the pod network isolation removing the old interface and adding the new one

2025-10-14T12:13:18.947Z 1d56895e-676d-43ed-a901-2f742e4f47b1 NSX 3111098 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.hyperbus_service Put app_id nsx.xxxx.xxxx-xxxxx-example-0% with IP 11.xx.xx.20/24, MAC 04:50:xx:xx:xx:87, gateway 11.xx.xx.1/24, vlan x,CIF d35a55ff-7d7d-4b22-9a46-4f58834167e0, wait_for_sync False into queue for hyperbus DEL,current size: 1

2025-10-14T12:13:18.947Z 1d56895e-676d-43ed-a901-2f742e4f47b1 NSX 3111098 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.hyperbus_service Put app_id nsx.xxxx.xxxx-xxxx-example-0% with IP 11.xx.xx.7/24, MAC 04:50:xx:xx:xx:3e, gateway 11.xx.xx.1/24, vlan 3,CIF 28c95482-d8af-4775-9be4-36129a36bde2, wait_for_sync False into queue for hyperbus ADD,current size: 2

This problem is expected to be addressed in NCP Release 4.2.4 and respectively added to the relative TKGi Patch release. 

This behaviour can be mitigated by setting a higher threshold in nsx_node_agent configuration.
This is controlled by the config parameter config_reuse_backoff_time in the [nsx-node-agent] section, default value is 15 seconds. (it is not defined as variable in the config)
This parameter is not exposed in the ncp bosh job. Therefore it needs to be configured directly on the worker nodes, and the setting will be overwritten by a TKGI cluster upgrade.

There is possible way to apply these setting using daemonset.

The file below file needs to have "config_reuse_backoff_time = 30" in place

cat /var/vcap/jobs/nsx-node-agent/config/ncp.ini
[DEFAULT]
use_stderr = False


[coe]

connect_retry_timeout = 30


[nsx_node_agent]
config_reuse_backoff_time = 30
proc_mount_path_prefix = ''

followed by monit reload and monit restart nsx-node-agent 

Resolution

Approach #1 - Bosh OS Config

Note: Runtime Configs will be applied to all VMs managed by Bosh Director. If you need to install the OS packages in just a subset of clusters and VMs/nodes, it's important that you configure the Runtime Config appropriately making use of the corresponding Runtime Config include and exclude rules. Wrong Runtime Config configuration can result in undesirable updates in clusters and VMs/nodes.

Example of Runtime Config setup:

  1. Create a runtime.yml file:
    releases:
    - name: "os-conf"
      version: "23.0.0"
    addons:
    - name: nsx-node-agent-update
      jobs:
      - name: pre-start-script
        release: os-conf
        properties:
          script: |-
            #!/bin/bash
            INI_FILE="/var/vcap/jobs/nsx-node-agent/config/ncp.ini"
            SEARCH_KEY="config_reuse_backoff_time"
            SECTION="[nsx_node_agent]"
            echo "Checking for $SEARCH_KEY in $INI_FILE"
            if grep -q "^${SEARCH_KEY}" "$INI_FILE"; then
              echo "No changes to apply: $SEARCH_KEY already present in $INI_FILE"
            else
              echo "Adding $SEARCH_KEY=30 under $SECTION in $INI_FILE"
              sed -i '/^\[nsx_node_agent\]/a config_reuse_backoff_time=30' "$INI_FILE"
            fi
      include:
        deployments: [<service-instance_XXXXXXXXXX>]                                        # Optional, you can define which deployments (TKGi clusters) this runtime config will be applied to.
        instance_groups: [<master and/or worker, as defined in the deployment manifest>]    # Optional, you can define which instance_groups (cluster nodes, i.e. masters/workers) this runtime config will be applied to.
      exclude:    
        deployments: [<service-instance_XXXXXXXXXX>]                                        # Optional, you can define which deployments (TKGi clusters) this runtime config will not be applied to.
        instance_groups: [<master and/or worker, as defined in the deployment manifest>]    # Optional, you can define which instance_groups (cluster nodes, i.e. masters/workers) this runtime config will not be applied to.
  2. Create a new runtime config:
    bosh update-config --type=runtime --name nsx-node-agent os-conf.yaml
     
  3. Verify the runtime config:
    bosh configs
    Get ID of the config created
    bosh config <ID>
    Review the created config  
  4. Upgrade the related clusters:
    tkgi upgrade-cluster <NAME>

Approach #2 - Daemonset

 

Apply a Daemonset that will verify if the file on each worker /var/vcap/jobs/nsx-node-agent/config/ncp.ini contains the above setting and if not it will append the line in the correct section.

The DS will require privileged mode in order to access the worker node file system 

Once the change is applied restart of the nsx-node-agent service is required for the changes to take effect 

This change will not be preserved during upgrade or recreation of a worker but the line will be added as long as the Daemonset is running 

Pause image (version might differ from TKGi versions)  and ubuntu image have to be downloaded to a private registry if cluster does not have internet access 

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: update-node-agent-admin
  namespace: pks-system
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      tkg: update-node-agent-admin
  template:
    metadata:
      creationTimestamp: null
      labels:
        tkg: update-node-agent-admin
    spec:
      containers:
      - image: projects.registry.vmware.com/tkg/pause:3.10
        imagePullPolicy: IfNotPresent
        name: sleep
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      hostPID: true
      initContainers:
      - command:
        - /bin/sh
        - -xc
        - |
          set -e
          INI_FILE="var/vcap/jobs/nsx-node-agent/config/ncp.ini"
          SEARCH_KEY="config_reuse_backoff_time"
          SECTION="[nsx_node_agent]"
          if grep -q "^${SEARCH_KEY}" "$INI_FILE"; then
              echo "No changes to apply: $SEARCH_KEY already present in $INI_FILE"
          else
              echo "1 Adding $SEARCH_KEY under $SECTION in $INI_FILE"
              sed -i '/^\[nsx_node_agent\]/a config_reuse_backoff_time=30' "$INI_FILE"
          fi
        image: ubuntu:23.04
        imagePullPolicy: IfNotPresent
        name: init
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/vcap
          name: hostfs
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/vcap
          type: ""
        name: hostfs

Additional Information