pks-nsx-t-osb-proxy continues failing with unknown certificate in previously working PKS environment
search cancel

pks-nsx-t-osb-proxy continues failing with unknown certificate in previously working PKS environment

book

Article ID: 298523

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Symptoms:
First, did you follow these high level steps detailed in this article pks-nsx-t-osb-proxy job fails while installing PKS?

Follow the instructions in this article if after following the article above you are still unable to "Apply Changes" via the Ops Manager UI and the errors in /var/vcap/sys/log/pks-nsx-t-osb-proxy/pks-nsx-t-osb-proxy.stderr.log are still showing the following:

time="2018-07-16T21:18:23Z" level=error msg="Failed to extract edge cluster ID from router 020124fa-184c-4570-babb-2ca4fb855102" pks-networking=networkManager
2018/07/16 21:18:23 Error initializing a NSX-T client: Error getting network manager for cluster Get https://10.193.53.20/api/v1/logical-routers/020124fa-184c-4570-babb-2ca4fb855102: remote error: tls: unknown certificate
time="2018-07-16T21:19:04Z" level=error msg="Failed to extract edge cluster ID from router 020124fa-184c-4570-babb-2ca4fb855102" pks-networking=networkManager
2018/07/16 21:19:04 Error initializing a NSX-T client: Error getting network manager for cluster Get https://10.193.53.20/api/v1/logical-routers/020124fa-184c-4570-babb-2ca4fb855102: remote error: tls: unknown certificate
time="2018-07-16T21:19:43Z" level=error msg="Failed to extract edge cluster ID from router 020124fa-184c-4570-babb-2ca4fb855102" pks-networking=networkManager
2018/07/16 21:19:43 Error initializing a NSX-T client: Error getting network manager for cluster Get https://10.193.53.20/api/v1/logical-routers/020124fa-184c-4570-babb-2ca4fb855102: remote error: tls: unknown certificate

Environment


Cause

Using an incorrect or older NSX-T principal identity to create or re-generate new certificates could be the issue but it is not the only reason your certs are problematic. One possible way to get back may be to re-populate the original certs from the existing Master node into the tile and Apply Changes again.

Resolution

If you have a running k8s master VM and can use "bosh ssh" to access it, then you can try the following to revert to previous certs:
bosh -d <cluster-service_UUID> ssh master/instance-id
sudo -i
cd /var/vcap/jobs/pks-nsx-t-prepare-master-vm/config/
Next, put the certs back into the Opsmanager > PKS Tile > Networking tile:
  • Copy the contents of nsx_t_superuser.crt for "NSX Manager Super User Principal Identity Certificate"
  • Copy contents of nsx_t_ca.crt for "NSX Manager CA Cert"
  • "Apply Changes" again via Ops Manager
If there are further issues with one or more of the Kubernetes Master nodes, you may need to either:
bosh recreate -d <cluster-service_UUID> master/instance-id
Or power-off the VM in vCenter to allow BOSH to recreate or resurrect it automatically.