The issue can occur due to a connectivity issue between the NSX Manager and the pods.
To confirm connectivity, test the following:
- Confirm NSX to K8 cluster connectivity:
- Get the external ingress IP of the k8s cluster by running the below command from the root CLI of the NSX manager:
# napp-k get svc -n projectcontour
# This should display the following output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
projectcontour ClusterIP 192.x.x.x <none> 8001/TCP
projectcontour-envoy LoadBalancer 192.x.x.x 10.x.x.x 80:31434/TCP,443:31873/TCP
2. Ensure that the external IP address (10.0.8.3 in the above example) is reachable from the manager node:
# openssl s_client -debug -connect 10.x.x.x:443
connect: Connection timed out
connect:errno=110
3. if you get timeout like the above, it means there is an issue in your k8s network infra.
- Check cluster-api to NSX Manager connectivity:
- Checking the log for the cluster API you see connection timed out errors like the below:
{"time":"2023-02-01T16:12:37.08024686Z","level":"ERROR","prefix":"-","file":"service.go","line":"426","message":"Fetching NSX config for populating intelligence default config failed: Unable to fetch platform deployment config: Get \"https://<nsx-manager>/policy/api/v1/infra/sites/default/napp/deployment/platform\": dial tcp 10.x.x.x:443: connect: connection timed out"}
- The "nsx-manager" is a service in k8s that proxies call to policy manager. Please check if there are any connectivity issues from the cluster-api pod by executing this command from the NSX Manager shell:
napp-k exec -it `napp-k get pods | grep cluster | cut -d ' ' -f 1` -c cluster-api -- sh -c "curl https://<nsx-manager>/policy/api/v1/infra/sites/default/napp/deployment/platform --cert /certs/egress-tls.crt --key /certs/egress-tls.key -k"
* Trying 10.x.x.x...
* TCP_NODELAY set
* connect to 10.x.x.x port 443 failed: Connection timed out
* Failed to connect to <nsx manager> port 443: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to <nsx manager> port 443: Connection timed out
command terminated with exit code 7
- In this example we confirmed the connection timed out.
- Investigate why these components are unable to communicate (firewall, physical networking etc).