PKS 1.5 upgrade fails on Master VM ncp job - the ncp job fails due to "Failed nsxlocks request"

search cancel

PKS 1.5 upgrade fails on Master VM ncp job - the ncp job fails due to "Failed nsxlocks request"

book

Article ID: 298600

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

A PKS 1.5 upgrade fails to upgrade the Master node.

Failing job: ncp

From /var/vcap/sys/log/ncp/ncp.stdout.log you will see:

1 2019-08-29T02:46:59.534Z a1b23fdb-9265-4797-85c6-b53d9b65f674 NSX 7539 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING" security="True"] nsx_ujo.ncp.election Get election configuration failed: Failed nsxlocks request: Failed to get nsxlocks : election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929, error:         {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nsxlocks.nsx.vmware.com \"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929\" is forbidden: User \"ncp\" cannot get resource \"nsxlocks\" in API group \"nsx.vmware.com\" at the cluster scope","reason":"Forbidden","details":{"name":"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929","group":"nsx.vmware.com","kind":"nsxlocks"},"code":403}

Perform the following:

Obtain the bosh logs for the service instance of the failing k8s cluster upgrade
Obtain the bosh logs for the Pivotal Container Service deployment

You can then initiate the following workaround steps in the Resolution section below. Then, report the issue by opening a case with Pivotal Support; remember to upload the log artifacts.

Until the root cause is identified and resolved, perform the workaround detailed in the Resolution section.

Environment

Product Version: 1.5

Resolution

Workaround

This is performed on the failing Master VM node:

bosh ssh -d <cluster deployment service-instance_xxxxx> master/0
sudo su
cd /var/vcap/jobs/pks-nsx-t-ncp/bin
bash post-start
monit restart ncp

Then login to the PKS control plane via the 'pks' CLI:

Note: make sure you are using CLI version 1.5.0 or greater

Confirm the CLI version, for example:

# pks --version

PKS CLI version: 1.5.0-build.291

Then kick off the upgrade again for the failed cluster:

$ pks login -a ...

$ pks upgrade-cluster <failed-cluster-name>

Feedback

thumb_up Yes

thumb_down No