Users experience SSO login failures and severe performance degradation after reverting to a snapshot following a failed GUI disk expansion. The Analytics service constantly crashes due to a massive volume of objects and database entries.
Additionally, when reverting to a snapshot, all cloud proxies may show a status of "coming online" but fail to fully come online even after troubleshooting various known causes.
Symptoms:
"failure of increase in disk space within the gui. After reverted snapshot and now SSO is broken and unable to login and also multiple other issues
All cloud proxies are stuck in a "coming online" state.
The UI is extremely slow or unresponsive.
Editing any vCenter adapter instance and clicking "Validate Connection" gives the error: Could not find any up collector in the collector group
Validation reveals a vCenter adapter instance duplicated itself many thousands of times, creating over a million redundant resources within the database.
Additional messages observed in the /storage/log/vcops/log/analytics-*.log within VCF Operations:
WARN analytics [Suspended Alert Recovery Thread] [com.vmware.statsplatform.persistence.impl.AlertSqlQueryBuilder.createQueryCondition] - resourceIdSet is too large (1798023) , return only first 100000 resource ids
VCF Operations 9.0.x
Environment instability following the snapshot revert caused the vCenter adapter instance to duplicate many thousands of times in one particular environment. This generates over a million redundant resources within the database. The excessive database load exhausts application and database resources, leading directly to the Analytics service instability. An exact cause was not able to be determined, but the cloud proxies were not backed up or snapshots taken, resulting in an overall mismatch in database and collector consistency.
If you encounter a situation where an adapter instance duplicates itself thousands of times, contact Broadcom Support and reference this KB article.
Follow these steps to resolve the issue:
Identify a known stable backup of the VCF Operations environment taken prior to the failed disk expansion and snapshot revert attempt.
Perform a full restoration of the environment using the stable backup.
Verify that the vCenter adapter instances are no longer duplicated, the Analytics service is stable, cloud proxies come online successfully, and that SSO logins succeed.
To back up VCF Operations, you create full virtual machine image-based backup jobs by using your VMware vSphere Storage APIs - Data Protection compatible backup solution. For more details, review the Broadcom TechDocs: Configure VMware Cloud Foundation Operations VM Level Backup