PKS create-cluster fails and ncp pre-start script shows Service cluster is unavailable
search cancel

PKS create-cluster fails and ncp pre-start script shows Service cluster is unavailable

book

Article ID: 298497

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

The purpose of this article is to assist operators in identifying the root cause for the NCP (NSX-T Container Plugin) job failing, when NCP may only report the NSX Manager service as being unavailable.


Environment


Cause

If the PKS cluster creation fails at the NCP job process, the NSX-T NCP job may require additional logging enabled to show details of the failure in NCP.

In this scenario:

-- The "pks create-cluster ... " command failed:
 

-- The cluster VMS were successfully created in BOSH. But the Kubernetes Master node failed with an error during ncp pre-start script.
 

-- The BOSH task showed ncp job fails, while others are successful:

  Example task output:

                    L Error: Action Failed get_task: Task 65cbac71-8109-4a68-6a2a-79552eeaab28 result: 1 of 8 pre-start scripts failed. Failed Jobs: ncp. Successful Jobs: pks-nsx-t-ncp, etcd, bpm, bosh-dns-enable, syslog_forwarder, bosh-dns, pks-nsx-t-prepare-master-vm.


-- From /var/vcap/sys/log/ncp/pre-start.stderr.log on the Kubernetes Master VM, it shows:

No handlers could be found for logger "vmware_nsxlib.v3.cluster"
Traceback (most recent call last):
  File "/usr/local/bin/configure_nsx_resources", line 285, in <module>
    if not args.func(args):
  File "/usr/local/bin/configure_nsx_resources", line 40, in wrapper
    raise e
vmware_nsxlib.v3.exceptions.ServiceClusterUnavailable: Service cluster: https://<NSX-Manager-addr-redacted>; is unavailable. Please, check NSX setup and/or configuration

-- The NCP reports NSX Manager as "unavailable". But the NSX Manager may be reachable and the underlying reason for the failure may not be clear.

Resolution

Because NCP is a process managed by Bosh, the following steps can be taken by the operator to obtain data from NCP (e.g., exceptions, errors, etc) in order to more-quickly identify the root cause:

- bosh ssh <Kubernetes Master node>

- sudo -i

- edit /var/vcap/jobs/ncp/config/ncp.ini

- Depending on what you are debugging, add/set the relevant param:

# for NCP operations
loglevel=DEBUG
# or for NSX API client operations
nsxlib_loglevel=DEBUG

 

- Then restart ncp:
monit restart ncp

And look NCP events generated in the /var/vcap/sys/log/ncp/ directory.  For example, ncp.stdout.log:


Additional Information

Refer to the NCP documentation for more information on additional ncp.ini settings.