Symptom:
VMware vCenter Server 7.0
VMware vCenter Server 8.0
The cmsso-util domain repoint failure is due to a code issue during service registration and can be identified as followed.
This issue is usually seen after removing and then readding the same vCenter to Enhanced Linked Mode topology.
1 It's observed in /var/log/vmware/cloudvm/cmsso_util.log that domain repoint command fails with "lstool register services failed: 1".
Prior to that, there's message indicating that vCenter attempts to register a service endpoint but finds the endpoint already exists:
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Registering endpoint with id 141cede2-30ec-49a0-890f-84f6c3d6db65
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Changing domain names in spec file: /storage/domain-data/service-phase-data/specs/141cede2-30ec-49a0-890f-84f6c3d6db65.spec
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util End point
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
already exists
......
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util lstool register services failed: 1
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Failed to register services during repointing
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Failed
[YYYY-MM-DDTHH:MM:SS] ERROR cmsso_util Failed to Re-install PSC services
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Embedded Domain Repoint Service Command Phase Failed. Please check logs
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Failed executing <cis.service_data.DcServicesCommand object at 0x7efdb80997d0>
[YYYY-MM-DDTHH:MM:SS] ERROR cmsso_util Re-pointing operation has failed during execution mode.
[YYYY-MM-DDTHH:MM:SS] INFO cmsso_util Repoint failed. Restore from backup
2 However, when looking at destination vCenter for any reference of that endpoint no duplicates are found.
There's one on destination vCenter which contains the service ID but has some additional data appended at the end:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx_com.vmware.cloud.provider.services.plugin
3 It appears that the service registration in cmsso-util domain-repoint workflow is actually doing a "contain" instead of "equal to" comparison when searching for service ID on destination vCenter during service registration.
As a result, it finds the above service registration even though it is actually different service registration.
This causes the logic to switch from a "register" to "reregister" but the reregister fails because the service registration actually doesn't exist which finally leads to repoint failure.
VMware Engineering is aware of the issue and is looking to improve the logic in this part of the code.
As workaround, please confirm the service registration on destination vCenter can be safely unregistered and unregister using below steps.
Make sure a valid backup or snapshot exists for vCenter before carrying out the action plan. If the vCenter is part of Enhanced Linked Mode, please take offline snapshots for all vCenters in Enhanced Linked Mode.
/usr/lib/vmware-lookupsvc/tools/lstool.py unregister --url http://localhost:7090/lookupservice/sdk --id <Service-ID that contains additional data appended at the end> --user <administrator account> --password <password>
Example: /usr/lib/vmware-lookupsvc/tools/lstool.py unregister --url http://localhost:7090/lookupservice/sdk --id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx_com.vmware.cloud.provider.services.plugin --user <administrator account> --password <password>
Once the service registration is unregistered, another domain-repoint should work.