A PKS 1.5 upgrade fails to upgrade the Master node.
Failing job: ncp
From /var/vcap/sys/log/ncp/ncp.stdout.log
you will see:
1 2019-08-29T02:46:59.534Z a1b23fdb-9265-4797-85c6-b53d9b65f674 NSX 7539 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING" security="True"] nsx_ujo.ncp.election Get election configuration failed: Failed nsxlocks request: Failed to get nsxlocks : election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929, error: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nsxlocks.nsx.vmware.com \"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929\" is forbidden: User \"ncp\" cannot get resource \"nsxlocks\" in API group \"nsx.vmware.com\" at the cluster scope","reason":"Forbidden","details":{"name":"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929","group":"nsx.vmware.com","kind":"nsxlocks"},"code":403}
Perform the following:
You can then initiate the following workaround steps in the Resolution section below. Then, report the issue by opening a case with Pivotal Support; remember to upload the log artifacts.
Until the root cause is identified and resolved, perform the workaround detailed in the Resolution section.
This is performed on the failing Master VM node:
bosh ssh -d <cluster deployment service-instance_xxxxx> master/0 sudo su cd /var/vcap/jobs/pks-nsx-t-ncp/bin bash post-start monit restart ncp
Then login to the PKS control plane via the 'pks'
CLI:
Note: make sure you are using CLI version 1.5.0 or greater
# pks --version PKS CLI version: 1.5.0-build.291
$ pks login -a ... $ pks upgrade-cluster <failed-cluster-name>