Resolving NSX-NCP Pod Stuck in CrashLoopBackOff due to NSX Search Service Error
search cancel

Resolving NSX-NCP Pod Stuck in CrashLoopBackOff due to NSX Search Service Error

book

Article ID: 319370

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service VMware NSX

Issue/Introduction

  • The NSX-NCP (NSX Container Plugin) Pod remains stuck in a CrashLoopBackOff state.
  • Upon inspecting the logs of the nsx-ncp container of the NSX-NCP Pod,an error message indicating an issue with the NSX Search service is seen:
vmware_nsxlib.v3.exceptions.ManagerError: Unexpected error from backend manager (['#.#.#.#:443', '#.#.#.#:443', '#.#.#.#:443', '#.#.#.#:443']) for GET policy/api/v1/search/query?query=resource_type:IpAddressBlock&sort_by=id: Search service is currently unavailable, please restart using 'restart service search' and resync using 'start search resync all' CLI commands on the NSX Appliance



Environment

  • VMware NSX
  • VMware vSphere 7.0 with Tanzu

Cause

A failure in the NSX Search service can cause the API call performed by the NSX-NCP Pod to the NSX Manager to fail, causing the NSX-NCP Pod to enter a CrashLoopBackOff state.

Resolution

Solution 1: Restart the NSX Search Service

Follow these steps to restart the NSX Search service and resolve the issue. If this solution doesn't resolve the problem, proceed to Solution 2.

  1. Open an SSH session to every NSX Manager appliance VM as the admin user.
  2. Restart the NSX Search service and resync the service by running the following commands on every NSX Manager appliance VM

    restart service search
    start search resync all

  3. Identify the NSX-NCP Pod. Run the following command from a system with kubectl installed and network access to the Kubernetes cluster.

    Note: If NCP is running on a vSphere supervisor, see the How to ssh into supervisor control plane VMs section of Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VMs for instructions.

    kubectl get pods -n vmware-system-nsx | grep nsx-ncp


  4. Restart the NSX-NCP pod by deleting it, as it will be automatically recreated:

    kubectl delete pod <nsx-ncp pod name> -n vmware-system-nsx


  5. After the NSX-NCP Pod is recreated, confirm it is now in a 'Running' state:

    kubectl get pods -n vmware-system-nsx | grep nsx-ncp

If the NSX-NCP Pod is now running without errors and doesn't enter CrashLoopBackOff after some minutes, the issue has been resolved. If the problem persists, proceed to Solution 2.

 

Solution 2: Restart vCenter

  1. Reboot the vCenter Server instance where the supervisor cluster is running.
  2. Identify the NSX-NCP Pod. Run the following command from a system with kubectl installed and network access to the Kubernetes cluster.

    Note: If NCP is running on a vSphere supervisor, see the How to ssh into supervisor control plane VMs section of Troubleshooting vSphere with Tanzu (TKGS) Supervisor Control Plane VMs for instructions.

    kubectl get pods -n vmware-system-nsx | grep nsx-ncp

  3. Restart the NSX-NCP pod by deleting it, as it will be automatically recreated:

    kubectl delete pod <nsx-ncp pod name> -n vmware-system-nsx


  4. After the NSX-NCP Pod is recreated, confirm it is now in a 'Running' state:

    kubectl get pods -n vmware-system-nsx | grep nsx-ncp
If the NSX-NCP Pod is now running without errors and doesn't enter CrashLoopBackOff after some minutes, the issue has been resolved.




Additional Information