VMware Enterprise PKS 1.5 upgrade fails to upgrade the Master node.
You see that ncp service on Master node is failing.
In the /var/vcap/sys/log/ncp/ncp.stdout.log on Master node, you see the entries similar to:
1 2019-08-29T02:46:59.534Z a1b23fdb-9265-4797-85c6-b53d9b65f674 NSX 7539 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING" security="True"] nsx_ujo.ncp.election Get election configuration failed: Failed nsxlocks request: Failed to get nsxlocks : election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929, error: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nsxlocks.nsx.vmware.com \"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929\" is forbidden: User \"ncp\" cannot get resource \"nsxlocks\" in API group \"nsx.vmware.com\" at the cluster scope","reason":"Forbidden","details":{"name":"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929","group":"nsx.vmware.com","kind":"nsxlocks"},"code":403}
This is a known issue affecting VMware Enterprise PKS 1.5 upgrade.
Currently, there is no resolution.
To work around this issue:
Log in to Master node by using bosh ssh:
bosh ssh <master-instance-id> -d <k8s-cluster-service-instance-id>
Run sudo su to get the root privileges.
Navigate to /var/vcap/jobs/pks-nsx-t-ncp/bin and run the post-start script:
cd /var/vcap/jobs/pks-nsx-t-ncp/bin
bash post-start
Restart the ncp service by running the command:
monit restart ncp
Log in to PKS control plan by using pks CLI and ensure that you are using CLI version 1.5.0 or greater.
# pks –version
PKS CLI version: 1.5.0-build.291
Re-initiate the failed cluster upgrade by running the command:
pks upgrade-cluster <failed-cluster-name>