PKS 1.5 upgrade fails on Master VM ncp job - the ncp job fails due to "Failed nsxlocks request"
search cancel

PKS 1.5 upgrade fails on Master VM ncp job - the ncp job fails due to "Failed nsxlocks request"

book

Article ID: 298600

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

A PKS 1.5 upgrade fails to upgrade the Master node.

Failing job: ncp

From /var/vcap/sys/log/ncp/ncp.stdout.log you will see:

1 2019-08-29T02:46:59.534Z a1b23fdb-9265-4797-85c6-b53d9b65f674 NSX 7539 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING" security="True"] nsx_ujo.ncp.election Get election configuration failed: Failed nsxlocks request: Failed to get nsxlocks : election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929, error:         {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nsxlocks.nsx.vmware.com \"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929\" is forbidden: User \"ncp\" cannot get resource \"nsxlocks\" in API group \"nsx.vmware.com\" at the cluster scope","reason":"Forbidden","details":{"name":"election-lock-pks-f42b0176-a592-4647-84a5-21ed36f3f929","group":"nsx.vmware.com","kind":"nsxlocks"},"code":403}

Perform the following:

  • Obtain the bosh logs for the service instance of the failing k8s cluster upgrade
  • Obtain the bosh logs for the Pivotal Container Service deployment

You can then initiate the following workaround steps in the Resolution section below. Then, report the issue by opening a case with Pivotal Support; remember to upload the log artifacts.

Until the root cause is identified and resolved, perform the workaround detailed in the Resolution section.


Environment

Product Version: 1.5

Resolution

Workaround

This is performed on the failing Master VM node:

bosh ssh -d <cluster deployment service-instance_xxxxx> master/0
sudo su
cd /var/vcap/jobs/pks-nsx-t-ncp/bin
bash post-start
monit restart ncp

Then login to the PKS control plane via the 'pks' CLI:

Note: make sure you are using CLI version 1.5.0 or greater

Confirm the CLI version, for example:
# pks --version

PKS CLI version: 1.5.0-build.291
Then kick off the upgrade again for the failed cluster:
$ pks login -a ...

$ pks upgrade-cluster <failed-cluster-name>