Supervisor cluster missing from the vCenter's Supervisor Management page after a Content Library sync operation
search cancel

Supervisor cluster missing from the vCenter's Supervisor Management page after a Content Library sync operation

book

Article ID: 416011

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Supervisor Management page displays -
    • Notification service error - There was a problem with the notifications mechanism. The data that you see on the screen may be outdated. Please refresh the screen in a few minutes in order to see up to date data.
  • Supervisor control plane nodes virtual machines are visible in the inventory though supervisor management page does not display supervisor configuration
  • wcp service in vCenter will be in stopped state/crashing. Starting the wcp service is fails
  • /var/log/vmware/wcp/wcpsvc.log shows following error messages:

error wcp [kubelifecycle/pman_client.go:397] [opID=XXXXXXXXX:Enable:domain-cXX] supervisor content is being processed
error wcp [content/catalog.go:863] [opID=XXXXXXXXX:Enable:domain-cXX] catalog is not ready
error wcp [kubelifecycle/kube_instance.go:3129] [opID=XXXXXXXXX:Enable:domain-cXX] unable to find desired version <version> image info. err: supervisor content is being processed
error wcp [kubelifecycle/kube_instance.go:3148] [opID=XXXXXXXXX:Enable:domain-cXX] Unable to find image info of desired version: <version> error: supervisor content is being processed
error wcp [kubelifecycle/pman_client.go:435] [opID=XXXXXXXXX:Enable:domain-cXX] unable to import depot and set solution err: supervisor content is being processed 
error wcp [kubelifecycle/pman_client.go:397] [opID=XXXXXXXXX:Enable:domain-cXX] supervisor content is being processed

 

 

 

Environment

VMware Cloud Foundation 9.0.0 and 9.0.1

Cause

Assigning Local content library or any content library to Supervisor fails and loose to access the Supervisor Management page.
 
The Supervisor content library was not initialized because the unknown library was incorrectly associated with the WCP service. This association caused the service to enter an infinite loop while searching for Spherelet images. resulting in WCP service crash.

 

Resolution

The VMware engineering team is already aware of this situation and the issue will be fixed in future releases

 

Workaround:
 
As a workaround the following steps can be followed to resume the WCP Service. You will still not be able to assign a local content library to the Supervisor.
  • SSH into the vCenter Appliance
    • ssh root@<vc-ip>
  • Connect to the VCDB database:
    • sudo -u wcp /opt/vmware/vpostgres/current/bin/psql --username wcpuser --dbname VCDB --host /var/run/vpostgres
  • Run this to inspect existing entries:
    • SELECT * FROM supervisor_content_source_configs;
  • Identify the problematic content library ( ideally which is recent added to the content library )
    • Example might see similar entries: 
      VCDB=> select * from supervisor_content_source_configs;
       id |               library                | status | state | phase |                                                                                                                                             
                                                                                                           messages                                                                                                     
                                                                                                                                                   | create_timestamp | last_update_timestamp 
      ----+--------------------------------------+--------+-------+-------+---------------------------------------------------------------------------------------------------------------------------------------------
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      ---------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------------
       XX | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | ERROR  | APPLY | ERROR | [{"Message": {"Id": "vcenter.wcp.content.library.processing.error", "Args": ["XXXXXXXXXXXXXXXXXXXXXXXXXX", "malformed Supervisor O
      VF template name "], "Params": null, "Localized": null, "DefaultMessage": "Error processing Content Library XXXXXXXXXXXXX
      , Error: malformed Supervisor OVF template name "}, "Severity": "ERROR"}] |       timestamp |            timestamp  (1 row)
  • Then delete the problematic one:
    • DELETE FROM supervisor_content_source_configs WHERE id=XX;
  • Exit psql and restart WCP:
    • exit
    • vmon-cli --restart wcp

You should now be able to access the Supervisor Management page.