VMware NSX Application Platform (NAPP) deployment stuck at 70% registering platform
search cancel

VMware NSX Application Platform (NAPP) deployment stuck at 70% registering platform

book

Article ID: 322444

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • You are using NSX-T
  • During the installation of NSX Application Appliance (NAPP) the installation fails at 70% with the below errors:
image.png
  • In /var/log/proton/napps.log  of the NSX Manager support bundle we see errors similar to the below 504 error:
2023-03-09 13:53:28 INFO api_request:115 [MainThread] - GET:/napp/api/v1/platform/monitor/platform/status
2023-03-09 13:53:43 INFO api_request:120 [MainThread] - b'upstream request timeout'
2023-03-09 13:53:43 ERROR api_request:133 [MainThread] - Unexpected error for GET /napp/api/v1/platform/monitor/platform/status, status: 504, body: b'upstream request timeout

 
  • Checking the var/log/proton/nsxapi.log on the NSX Manager support bundle we see errors for NAPP registration failed:

2023-02-01T16:15:00.708Z  INFO http-nio-127.0.0.1-7440-exec-23 CloudNativePlatformFacadeImpl 11508 NAPP [nsx@6876 comp="nsx-manager" level="INFO" reqId="e681c515-d00e-4211-b37e-a93314add885" subcomp="manager" username="nsx-opsagent"] Get PlatformDeploymentProgress: DeploymentProgressStatusDto{overallStatus='DEPLOYMENT_FAILED', percentage='70', progressMessage='Registering Platform', errorMessage='[NSX Application Platform registration failed.]'}
 
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware NSX-T Data Center

Cause

The issue can occur due to a connectivity issue between the NSX Manager and the pods.
To confirm connectivity test the following:
  • Confirm NSX to K8 cluster connectivity:
  1. Get the external ingress IP of the k8s cluster by running the below command from the root CLI of the NSX manager:
root@nsx-mgr-0:~# napp-k get svc -n projectcontour
# This should display the following output:
NAME                     TYPE         CLUSTER-IP    EXTERNAL-IP      PORT(S) 
projectcontour           ClusterIP    192.168.1.164  <none>          8001/TCP
projectcontour-envoy     LoadBalancer 192.168.1.183  10.0.8.3 80:31434/TCP,443:31873/TCP
2. Ensure that the external IP address (10.0.8.3 in the above example) is reachable from the manager node:
root@nsx-mgr-0:~# openssl s_client -debug -connect 10.0.8.3:443
connect: Connection timed out
connect:errno=110
3. if you get timeout like the above, it means there is an issue in your k8s network infra.
  • Check cluster-api  to NSX Manager connectivity:
    • Checking the log for the cluster API you see connection timed out errors like the below:
{"time":"2023-02-01T16:12:37.08024686Z","level":"ERROR","prefix":"-","file":"service.go","line":"426","message":"Fetching NSX config for populating intelligence default config failed: Unable to fetch platform deployment config: Get \"https://nsx-manager/policy/api/v1/infra/sites/default/napp/deployment/platform\": dial tcp 10.10.10.171:443: connect: connection timed out"}
  • The "nsx-manager" is a service in k8s that proxies calls to policy manager. Please check if there is any connectivity issues from the cluster-api pod by executing this command from the NSX Manager shell:
napp-k exec -it `napp-k get pods | grep cluster | cut -d ' ' -f 1` -c cluster-api -- sh -c "curl https://nsx-manager/policy/api/v1/infra/sites/default/napp/deployment/platform --cert /certs/egress-tls.crt --key /certs/egress-tls.key -k"

* Trying 10.1.15.8...
* TCP_NODELAY set
* connect to 10.1.15.8 port 443 failed: Connection timed out
* Failed to connect to external-nsx-manager port 443: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to external-nsx-manager port 443: Connection timed out
command terminated with exit code 7

 
  • In this example we confirmed the connection timed out. 
  • Investigate why these components are unable to communicate (firewall, physical networking etc).

Resolution

The product is working as expected. This is a connectivity issue between the NSX manager and the K8 pods.