vcd-ds-controller-manager in crashloopbackoff due to nsx-t load balancer ip filtering
search cancel

vcd-ds-controller-manager in crashloopbackoff due to nsx-t load balancer ip filtering

book

Article ID: 436109

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

When attempting to create or manage instances through VMware Cloud Director (VCD) Extension for Data Solutions, the following symptoms are observed:

  • The Kubernetes cluster status shows as Not Available or remains in a Pending state in the Data Solutions UI.
  • The vcd-ds-controller-manager pod in the vcd-ds-system namespace is in a CrashLoopBackOff state.
  • Inspecting the pod logs reveals connection errors when attempting to communicate with the VCD API:

    ERROR main Unable to create vcd session {"error": "Post \"https://<vcd-fqdn>/oauth/provider/token\": read tcp <pod-ip>:<pod-port>-><vcd-ip>:443: read: connection reset by peer"}

Environment

VMware Cloud Director Container Service Extension 4.x
VMware Cloud Director 10.6.x

Cause

This issue occurs when an NSX-T Native Load Balancer sitting in front of the VMware Cloud Director cells is configured with HTTP Access Control rules that restrict traffic based on IP Groups.

If the translated source IP (SNAT) of the Kubernetes worker nodes—or the specific IP range used by the Data Solutions operator—is not explicitly included in the allowed IP Group, the Load Balancer will drop or reset the connection, preventing the operator from authenticating with VCD.

Resolution

To resolve this issue, you must identify and whitelist the source IP address in the NSX-T configuration.

Step 1: Identify the Dropped Source IP

  1. Log in to the NSX-T Manager UI.
  2. Navigate to Networking > Load Balancing > Virtual Servers.
  3. Locate the Virtual Server for your VMware Cloud Director API (port 443).
  4. Check the Access Logs or use a packet capture tool to identify the Client IP address that is receiving a Reset or Drop action when the vcd-ds-controller-manager attempts to connect.

Step 2: Update NSX-T IP Groups

  1. Navigate to Inventory > Groups.
  2. Find the IP Group used by the HTTP Access Control strategy of your VCD Virtual Server.
  3. This command will make changes to your system. Review it carefully before running.

    a. Click Edit on the Group.
    b. Add the identified IP address or the entire CIDR range of the Kubernetes worker nodes to the IP Addresses list.
  4. Save the changes.

Step 3: Verify the Fix

  1. On your Kubernetes management cluster, restart the deployment to trigger a new pod initialization:

    Review this command before running it.

    kubectl rollout restart deployment vcd-ds-controller-manager -n vcd-ds-system
  1. Monitor the pod status:

    Review this command before running it.

    kubectl get pods -n vcd-ds-system -w

  1. Confirm the pod reaches the Running status and that the Kubernetes cluster status in the Data Solutions UI transitions to Available.