Unable to Push apps/NCP restarting due to "Failed to initialize container orchestrator adaptor: 'external

Products

VMware NSX

Issue/Introduction

To confirm the external_id parameter is missing:

1. Collect the list of Container Cluster(s) from NSX-T Manager:

# curl -k -u admin -H "Content-Type: application/json" -X GET https://localhost/api/v1/fabric/container-clusters/
Enter host password for user 'admin':
{
  "results" : [ {
    "external_id" : "c9ef8d33-xxxx-xxxx-xxxx-6050028bfacf",
    "cluster_type" : "PAS",
    "infrastructure" : {
      "infra_type" : "vSphere"
    },
    "origin_properties" : [ ],
    "resource_type" : "ContainerCluster",
    "display_name" : "TEST",
    "_last_sync_time" : 1580396792638
  } ],
  "result_count" : 1,
  "sort_by" : "display_name",
  "sort_ascending" : true
}

2. Verify the external_id parameter is missing for the collected Container Cluster external_id:
curl -k -u admin -H "Content-Type: application/json" -X GET https://localhost/api/v1/fabric/container-applications?container_cluster_id=c9ef8d33-xxxx-xxxx-xxxx-050028bfacf

Enter host password for user 'admin':
- notice here that the external_id parameter is missing in the response:

{
"results" : [ {
  "container_cluster_id" : "c9ef8d33-xxxx-xxxx-xxxx-6050028bfacf",
  "container_project_id" : "b04095af-xxxx-xxxx-xxxx-f86eb561a81c",
  "origin_properties" : [ ],
  "resource_type" : "ContainerApplication",
  "_last_sync_time" : 0
}

Symptoms:

Symptoms will include all of the following:

NCP is unable to push Apps.
NCP will report restarting due to "Failed to initialize container orchestrator adaptor: 'external_id'".

NCP Log will contain events similar to the following in ./ncp/ncp.stdout.log:

./ncp/ncp.stdout.log.1:1 2020-02-09T21:41:24.287Z 7d6c57b9-xxxx-xxxx-xxxx-c57c63241782 NSX 16223 - [nsx@6876 comp=“nsx-container-ncp” subcomp=“ncp” level=“CRITICAL”] nsx_ujo.ncp.main Failed to initialize container orchestrator adaptor: ‘external_id’ ./ncp/ncp.stdout.log.1:1 2020-02-09T21:42:26.037Z 7d6c57b9-xxxx-xxxx-xxxx-c57c63241782 NSX 16447 - [nsx@6876 comp=“nsx-container-ncp” subcomp=“ncp” level=“CRITICAL”] nsx_ujo.ncp.main Failed to initialize container orchestrator adaptor: ‘external_id’ ./ncp/ncp.stdout.log.1:1 2020-02-09T21:43:27.073Z 7d6c57b9-xxxx-xxxx-xxxx-c57c63241782 NSX 16673 - [nsx@6876 comp=“nsx-container-ncp” subcomp=“ncp” level=“CRITICAL”] nsx_ujo.ncp.main Failed to initialize container orchestrator adaptor: ‘external_id’

Cause

The issue occurs when the Container Application Instance object is received by the MP Inventory, but the associated App has not yet been received. NCP creates the App without the required “external_id” field. The App fails without that field.

Resolution

This issue is resolved if running a fresh install of NSX-T 2.5.1.

If upgrading from NSX-T 2.5.0 to NSX-T 2.5.1 the below steps need to be performed as well.

After the update to NSX-T 2.5.1
-Stop all NCP instances.

Option (1) Remove only those container_cluster entries with invalid container-application entries.
                       Identify the impacted cluster id's with the following API calls.
                  (a) GET https://<NSX Manager IP>/api/v1/fabric/container-clusters/

                     For each item in the result of (a), do
         (b) GET https://<NSX Manager IP>/api/v1/fabric/container-applications?container_cluster_id=<container_cluster_id>
         (c) Check if any container-application entry without external_id in the result of (b)

        If so remove the cluster id entries that are missing the external id field with the following API call.
                   (a) DELETE https://<NSX Manager IP>/ api/v1/fabric/container-clusters/<container_cluster_id>

Option (2) Remove all container_cluster entries.
Identify all the cluster id's with the following API calls.
                       (a) GET https://<NSX Manager IP>/api/v1/fabric/container-clusters/
For each item in the result of (a), use the following API call to remove.
(b) DELETE https://<NSX Manager IP>/api/v1/fabric/container-clusters/<container_cluster_id>

- Start all NCP instances

At this time if the steps in the workaround section have been applied. Those changes to the NCP.ini can be undone.

Workaround:
This workaround removes the “external_id” requirement by disabling the inventory feature.

(1) login to each diego_database, for example (from OpsMgr)
     bosh ssh -d <deployment-id> diego-database/xxxxxxxx
(2) sudo su
(3) cd /var/vcap/data/jobs/ncp/yyyyyyyy/config/
      where yyyyyyyy should be the latest deployment ID
(4) vi ncp.ini
     under [nsx_v3] section, add
     enable_inventory = False <-- as shown, with a captial 'F' in False
(5) restart NCP
# monit restart ncp