Fix PSC/vmdir inconsistencies using fixpsc python script
search cancel

Fix PSC/vmdir inconsistencies using fixpsc python script

book

Article ID: 316566

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Impact/Risks:
This script will generate changes in the VMDIR database.
Offline snapshots of all vCenters in the environment are mandatory.


To identify if this process is required, please use the vmdir_tool.py tool as outlined in 
Using vmdir_tool.py to identify vmdir/ELM replication inconsistencies.
 
Fixpsc.py is a tool that addresses issues with data stored in the PSC database (embedded or external) and data local to a vCenter Server. This tool can detect and correct problems that could cause failures in topology changes (converge, repoint, cross-domain repoint, etc.), upgrades, or failures incurred as a result of maintenance (e.g., incorrectly applying new SSL certificates). This article will outline its functions and use.

Symptoms:
  • When querying the vmdir state using vdcadmintool, you get a 'Null' or 'Read Only' state
  • When logging into a vCenter participating in an Enhanced Linked Mode group (ELM), you may see a banner warning in regards to "your environment is now in mixed mode"
  • Querying replication status shows 'Status available: No" 
  • Other symptoms, such as: 
    • User/Group changes are not propagating to other vCenters in the SSO domain
    • Cannot repoint, upgrade, converge
  • Cannot register other vSphere products
  • You may see the following errors in /var/log/vmware/vmdird/vmdird-syslog.log
err vmdird  t@140609914369792: UpdateServerObject: InternalModifyEntry failed. Error code: 53, Error string: Server in read-only mode

NOTE:  If the issue was caused by a failed major upgrade (i.e. 7.0 to 8.0), fixpsc may be destructive.  To determine if the ELM group is in a bad state, run the following command to check for consistent version information across all nodes in ELM:
 
/usr/lib/vmware-lookupsvc/tools/lstool.py list --url https://$HOSTNAME/lookupservice/sdk --type vcenterserver | grep Version -A4

Example version mismatch assuming vc2.example.com has already been upgraded to version 8, but running the above command from vc1 shows the following:

root@vc1 [ ~ ]# /usr/lib/vmware-lookupsvc/tools/lstool.py list --url https://$HOSTNAME/lookupservice/sdk --type vcenterserver | grep Version -A4
        Version: 8.0
        Endpoints:
                Type: com.vmware.vim
                Protocol: vmomi
              URL: https://vc1.example.com:443/sdk
--
      Version: 7.0 <-----------
        Endpoints:
                Type: com.vmware.cdc.provider
                Protocol: vmomi
                URL: https://vc2.example.com:443/sdk

If the same command on another node in the ELM group shows the correct information, you would want to use that vCenter as the 'healthy' vCenter.  If there is no other node in the ELM group that has the correct information for all other nodes, then fixpsc will no longer fix the problem.  Instead, you should use the cross-domain repoint process to fix the issue.

Environment

VMware vCenter Server 8.0.x
VMware vCenter Server 7.0.x
VMware vCenter Server 6.7.x

Cause

In some cases, restoring a vCenter Server with embedded PSC or external PSCs from snapshots without restoring the rest of the SSO domain nodes can introduce inconsistencies in the VMDIR database, e.g., from USN (Update Sequence Number) value inconsistencies to information that is not there.

if it is not noticed that the replication is inconsistent and major changes to the ELM group (such as converge, update, upgrade, or add new vCenters,etc.), this issue can become an outage.
For this script to work, there needs to be a node with the correct information to fix the rest using this tool. 

Resolution

Note: Before proceeding, ensure valid snapshots have been taken on all PSC/vCenter nodes in the SSO domain.


Fixpsc.py has the following functions:
  • help
  • rebuild
  • data
  • MDB Cleanup
 
Help
For help, run: ./fixpsc

Rebuild
For rebuild options help:
 

usage: fixpsc.py rebuild [-h] --healthy-psc-fqdn REPL_PARTNER_FQDN
                         [--machine-id MACHINE_ID] [--ldu-id LDU_ID]

optional arguments:
  -h, --help            show this help message and exit
  --healthy-psc-fqdn REPL_PARTNER_FQDN
                        Healthy PSC FQDN
  --machine-id MACHINE_ID
                        Machine GUID of Localhost
  --ldu-id LDU_ID       LDU GUID of Localhost

Use the rebuild option on an affected unhealthy node. The rebuild option removes the local node from the SSO domain and again promotes the vmdir on the local node with a healthy PSC as a replication partner. In the end, the local node will contain all the data of a healthy PSC.

  • Rebuild vmdir with the help of Healthy PSC. Run the following command on the broken unhealthy node pointing to the good healthy node as denoted by --healthy-psc-fqdn.
./fixpsc rebuild --healthy-psc-fqdn <FQDN>
  • For rebuild with custom machine and LDU GUIDs/IDs (use when there are different values in the registry)
./fixpsc rebuild --healthy-psc-fqdn <FQDN> --machine-id <MACHINE_ID> --ldu-id <LDU_ID>

Data
For data option help:
usage: fixpsc.py data [-h] --mode DATA_ACTION --healthy-psc-fqdn REPL_PARTNER_FQDN [--healthy-psc-rhttps-port DEST_PSC_RHTTPS] [--no-check-certs] optional arguments: -h, --help show this help message and exit --mode DATA_ACTION [pre-check|export|import] --healthy-psc-fqdn REPL_PARTNER_FQDN Healthy PSC FQDN --healthy-psc-rhttps-port DEST_PSC_RHTTPS Healthy PSC rhttp port --no-check-certs Ignore certificate validations.

Data Import/Export/Pre-check

During the rebuild process, if some data (Authz/Tagging/License) is not present on the Healthy PSC but is required for the proper functioning of VC, use the export operation before the rebuild option and then use the import operation after rebuild. Exported data will be present in /storage/domain-data/ and will be cleaned up if the import is successful.

  • For Authz/Tagging/License/CEIP data pre-check, run: 
./fixpsc data --mode pre-check --healthy-psc-fqdn <FQDN>
  • For Authz/Tagging/License/CEIP data export, run:
./fixpsc data --mode export --healthy-psc-fqdn <FQDN>
  • For Authz/Tagging/License/CEIP data export, run:
./fixpsc data --mode import --healthy-psc-fqdn <FQDN>
  • With custom rhttps port on the healthy psc and no cert check, run: 
./fixpsc data --mode import --healthy-psc-fqdn <FQDN> --healthy-psc-rhttps-port <PORT> --no-check-certs

MDB Cleanup

vmdir data is stored in a file in folder location /storage/db/vmware-vmdir/. Over time, for specific customers, the file size would increase. Running this utility would reduce the file size without losing any data. This tool also aims to automate the removal of tombstone entries. This would be the automated version of troubleshooting and addressing the accumulation of tombstones in a Platform Services Controller (embedded or external) :  

Troubleshooting and addressing accumulation of tombstones in a Platform Services Controller

To perform the operation, we would need the binary mdb_copy. As of now, we have compiled the code and checked in the binary.

  • Basic usage

    The cleanup utility takes in two parameters:

    temp-location: This is location where we would place the the compressed mdb file. This is an optional parameter if none is passed then “/tmp” would be treated as a default value. If the folder does not have enough free space to store the data.mdb file it would report the same error.

    tombstone-threshold: Any entry that is older than the number of days specified would be removed. The default value is 30 for this parameter. On an ELM setup value of zero is not allowed as that might create state entries. For standalone system the value can be used for zero which would remove all the tombstone entries.

    The tombstone cleanup operation can only be performed on VC 8.0+ environments as it requires additional support from the VMDir service. Setting the tombstone-threshold value to -1 would skip the tombstone removal step and would work on a VC 7.0 environment.

    ./fixpsc mdbcleanup --temp-location /tmp/mdbloc --tombstone-threshold 15

 



Attachments

fix-psc-master get_app