When upgrading vCenter Server to 8.0 U3, WCP service fails to start
search cancel

When upgrading vCenter Server to 8.0 U3, WCP service fails to start

book

Article ID: 411065

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • WCP service failed to start after importing data during stage 2 of the vCenter upgrade.
  • Manually tried to start the WCP service and observed that service pre-start was not successful.
  • Reviewing /var/log/vmware/vmon/vmon.log, the below errors were observed:

    YYYY-MM-DDTHH:MM:SS In(05) host-##### Received start request for wcp
    YYYY-MM-DDTHH:MM:SS In(05) host-##### <wcp-prestart> Constructed command: /usr/bin/python /usr/lib/vmware-wcp/wcpsvc-prestart.py
    YYYY-MM-DDTHH:MM:SS Wa(03) host-##### <wcp> Service pre-start command's stderr: Patching error: Traceback (most recent call last):
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/./day0_patching/post_rpm/wcp/patchingOrchestrator.py", line 98, in doIncrementalPatching
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     mod.doPatchingWithDependencies()
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/day0_patching/post_rpm/wcp/patches/roles_groups_users.py", line 438, in doPatchingWithDependencies
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     patch_authz(featureState)
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/day0_patching/post_rpm/wcp/patches/roles_groups_users.py", line 400, in patch_authz
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     authz_patch.setup_roles("/usr/lib/vmware-wcp/netoperator-roles.xml")
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/day0_patching/post_rpm/wcp/patches/roles_groups_users.py", line 240, in setup_roles
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     raise Exception("Role %s (id: %d) not found in VC." % (expected_role["name"], expected_role_id))
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-##### Exception: Role NetOperatorController (id: 1025) not found in VC.
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-##### During handling of the above exception, another exception occurred:
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-##### Traceback (most recent call last):
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/./day0_patching/post_rpm/wcp/patchingOrchestrator.py", line 332, in <module>
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     doPatching(patching_in_post_rpm)
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/./day0_patching/post_rpm/wcp/patchingOrchestrator.py", line 64, in doPatching
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     doIncrementalPatching(patching_in_post_rpm)
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####   File "/usr/lib/vmware-wcp/./day0_patching/post_rpm/wcp/patchingOrchestrator.py", line 102, in doIncrementalPatching
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####     raise Exception(err_msg % (module_path, str(e)))
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-##### Exception: Failed to patch roles_groups_users! Error: Role NetOperatorController (id: 1025) not found in VC.
    YYYY-MM-DDTHH:MM:SS Wa(03)+ host-#####
    YYYY-MM-DDTHH:MM:SS Er(02) host-##### <wcp> Service pre-start command failed with exit code 1.
    YYYY-MM-DDTHH:MM:SS Wa(03) host-##### [ReadSvcSubStartupData] No startup information from wcp.

Environment

vCenter Server 8.x

Cause

This error can occur when one or more of the default roles in vCenter have been modified, resulting in their role ID property being different from what it should be.
The upgrade installer uses this property to identify the roles to use them when re-registering the services during the upgrade, and will fail when it cannot find a specific role.

Resolution

The following steps can be implemented to remove the invalid role and then publish the default role into VMDIR

  1. Validate the role names and its corresponding role ID's from VMDIR using the command below:
    ldapsearch -o ldif-wrap=no -LLL -h localhost -b "cn=RoleModel,cn=VmwAuthz,cn=Services,dc=vsphere,dc=local" -s sub -D "cn=Administrator,cn=Users,dc=vsphere,dc=local" -W >/tmp/roles.ldif

    Reviewing roles.ldif, NetOperatorController role has an incorrect role ID.
    Sample Output:
    dn: cn=-1565401839,cn=RoleModel,cn=VmwAuthz,cn=services,dc=vsphere,dc=local
    cn: -1565401839

  2. Remove the invalid role using its dn:
    ldapdelete -H ldap://localhost -D "cn=administrator,cn=users,dc=vsphere,dc=local" -W "cn=-1565401839,cn=RoleModel,cn=VmwAuthz,cn=services,dc=vsphere,dc=local"

  3. Create a LDIF of the missing role in vCenter and copy the below contents into it.

    vi /tmp/1025.ldif

    dn: cn=1025,cn=RoleModel,cn=VmwAuthz,cn=services,dc=vsphere,dc=local
    changetype: add
    vmwAuthzRolePrivilegeId: System.Anonymous
    vmwAuthzRolePrivilegeId: System.Read
    vmwAuthzRolePrivilegeId: System.View
    vmwAuthzRolePrivilegeId: Network.Assign
    vmwAuthzRoleVersion: 0
    vmwAuthzRoleName: NetOperatorController
    vmwAuthzRoleDescription: This role entitles the Netoperator controller to get and mutate network settings.
    objectClass: top
    objectClass: vmwAuthzRole
    cn: 1025

  4. Import the role to VMDIR using:
    /opt/likewise/bin/ldapmodify -f /tmp/1025.ldif -h localhost -D "cn=Administrator,cn=Users,dc=vsphere,dc=local" -W

  5. Start the WCP service
    service-control --start wcp