"Vmware Directory Service failed to delete legacy schema" error while patching vCenter Server 7.x

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
While attempting to patch vCenter Server 7.0, the patch fails with the following messages in the /var/log/vmware/applmgmt/PatchRunner.log

YYYY-MM-DD HH:MM:SS vmdir:Patch ERROR LdapConnection Error deleting entry DN: cn=aggregate,cn=schemacontext
YYYY-MM-DD HH:MM:SS vmdir:Patch ERROR vmware_b2b.patching.executor.hook_executor Patch hook 'vmdir:Patch' failed.
Traceback (most recent call last):
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/py/vmware_b2b/patching/executor/hook_executor.py", line 74, in executeHook
executionResult = systemExtension(args)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/libs/sdk/extensions.py", line 106, in __call__
result = self.extension(*args)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/libs/sdk/extensions.py", line 123, in _func
return func(*args)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/payload/components-script/vmdir/__init__.py", line 384, in patch
DoUpgrade(ctx)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/payload/components-script/vmdir/__init__.py", line 414, in DoUpgrade
delete_legacy_schema()
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/payload/components-script/vmdir/__init__.py", line 362, in delete_legacy_schema
raise InternalError("Vmware Directory Service failed to delete legacy schema")
patch_errors.InternalError: Vmware Directory Service failed to delete legacy schema
2022-11-01T05:37:09.100Z ERROR vmware_b2b.patching.phases.patcher Patch hook Patch got ComponentWrapperError.
Traceback (most recent call last):
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 203, in patch
_patchComponents(ctx, userData, statusAggregator.reportingQueue)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/py/vmware_b2b/patching/phases/patcher.py", line 85, in _patchComponents
executeComponentHook(Hook.Patch, ctx, c, userData, reportingQueue)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/py/vmware_b2b/patching/executor/execution_facade.py", line 98, in executeComponentHook
reportQueue, identifier, expectedResultType)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/py/vmware_b2b/patching/executor/execution_facade.py", line 53, in executeHook
result = executor.executeHook(scriptFile, hook, args, reportQueue, reportIdentifier)
File "/storage/seat/software-updateofgeunkl/stage/scripts/patches/py/vmware_b2b/patching/executor/hook_executor_process.py", line 119, in executeHook
raise ex
patch_errors.ComponentError
YYYY-MM-DD HH:MM:SS WARNING root stopping status aggregation...
YYYY-MM-DD HH:MM:SS ERROR __main__ Patch vCSA failed

When reviewing the /var/log/vmware/vmdird/vmdird-syslog.log file, you see the following log pattern:

YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: VmDirInitDbCopyThread: database snapshot reg keys: CopyDbWritesMin 1 CopyDbIntervalInSec 60 CopyDbBlockWriteInSec 30
YYYY-MM-DD HH:MM:SS info vmdird t@140176877655808: vdirRaiseThreadSchedPriority: old_sch_policy=0 old_sch_priority=0 new_sch_policy=2 new_sch_priority=1
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Tombstone reaping thread started
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Check DB Size thread started
YYYY-MM-DD HH:MM:SS info vmdird t@140176768616192: SetupLdapPort: start listening on port 389 for ipv4
YYYY-MM-DD HH:MM:SS info vmdird t@140176760223488: SetupLdapPort: start listening on port 636 for ipv4
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: VmDirInit: all LDAP ports are ready for accepting services.
YYYY-MM-DD HH:MM:SS info vmdird t@140176793794304: Taking snapshot, setting backend state to MDB_RDONLY <---- Set to read only (Expected during DB snapshot)
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Modifying gid from 0 to 3914
YYYY-MM-DD HH:MM:SS info vmdird t@140176793794304: _VmDirCpMdbFile: making database snapshot with file size 2315Mb; will take approximate 46 seconds; 0 updates occurred since last snapshot.
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Modifying uid from 0 to 9899
YYYY-MM-DD HH:MM:SS info vmdird t@140176760223488: VmDirUpdateDCNameToLocalNode - PNID: (vcsa.domain.com)
YYYY-MM-DD HH:MM:SS info vmdird t@140176768616192: VmDirUpdateDCNameToLocalNode - PNID: (vcsa.domain.com)
YYYY-MM-DD HH:MM:SS info vmdird t@140176768616192: VmDirUpdateDCNameToLocalNode - pszDCName: (vcsa.domain.com)
YYYY-MM-DD HH:MM:SS info vmdird t@140176768616192: Successfully notified VMAFD to update DC Name to local node
YYYY-MM-DD HH:MM:SS info vmdird t@140176760223488: VmDirUpdateDCNameToLocalNode - pszDCName: (vcsa.domain.com)
YYYY-MM-DD HH:MM:SS info vmdird t@140176760223488: Successfully notified VMAFD to update DC Name to local node
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Config MaxLdapOpThrs (1023)
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: The federation health mgmt feature switch is disabled
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: The simplified replication feature switch is disabled
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: VmDir State (8)
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Lotus Vmdird: running... state (8)
YYYY-MM-DD HH:MM:SS info vmdird t@140198797887296: Lotus Vmdird: running in FIPS mode.
YYYY-MM-DD HH:MM:SS info Starting VMware Directory Servicedone
YYYY-MM-DD HH:MM:SS info vmdird t@140176751830784: Init Sid cache (dc=vsphere,dc=local) RID (1038)
YYYY-MM-DD HH:MM:SS err vmdird t@140176751830784: VmDirMDBGetNextUSN: failed with error (13),(Permission denied) <---- Incoming requests from upgrade
YYYY-MM-DD HH:MM:SS err vmdird t@140176751830784: InternalDeleteEntry: VdirExecutePostDeleteCommitPlugins - code(9114) <---- Incoming requests from upgrade
YYYY-MM-DD HH:MM:SS err vmdird t@140176751830784: VmDirSendLdapResult: Request (Delete), Error (LDAP_UNWILLING_TO_PERFORM(53)), Message (Get next USN), (0) socket (127.0.0.1) <---- Incoming requests from upgrade fail (expected as we are in read-only mode)
YYYY-MM-DD HH:MM:SS info vmdird t@140176793794304: Backend state set to NORMAL <------- set to normal again (Expected, after db snapshot is done)
YYYY-MM-DD HH:MM:SS info vmdird t@140176793794304: _VmDirCpMdbFile: completed making snapshot with file size 2315Mb in 14 seconds; data transfer rate: 165.3MB/sec, db last tid: 5378100

NOTE: If you do not see current logs in the vmdird-syslog.log file, you may be experiencing another issue with rsyslog, outlined in the following KB: rsyslog stops logging to vmdird and messages in vCenter Server 7.0

To work around missing logs, run the following command and search for "vmdird" to find relevant logs:

journalctl -b

Environment

VMware vCenter Server 7.0.x

Cause

The patch fails because vmdir is in read-only mode when it attempts to remove old data as part of the normal patching process. The behavior is expected to happen on vmdird startup, but if the database is excessively large it may take longer to complete than the patch expects.

Resolution

At this time there is only a workaround available.

Workaround:
In order to work around this problem, you can take the following steps:

1. First, ensure that you have powered-off snapshots of ALL nodes in the affected environment, or recent backups of all nodes. This should already be in place before patching.

2. Run the following command, which should stop the database snapshot from occurring during startup of the service:

/opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "0"

3. Restart the vmdird service:

service-control --restart vmdird

4. Re-apply the patch.

5. Once patching is complete, revert the changes made:

/opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"

6. Restart the vmdird service:

service-control --restart vmdird

If other nodes require patching, you may want to repeat this procedure to prevent it causing another failure.