VMware Identity Manager pgpool or OpenSearch service fails with unassigned shards
search cancel

VMware Identity Manager pgpool or OpenSearch service fails with unassigned shards

book

Article ID: 430772

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

You may encounter an Error 500 when accessing the VMware Identity Manager (vIDM) admin page or observe that the vIDM pool is down in the load balancer. This often prevents vRealize Lifecycle Manager (vRLCM) remediation or upgrades. Key symptoms include:

  • OpenSearch service failing to start or showing a high number of unassigned shards.
  • Log errors in analytics-service.log indicating: this action would add [x] total shards, but this cluster currently has [1000]/[1000] maximum shards open.
  • vIDM system diagnostic dashboard showing Yellow or Red status for OpenSearch.

Environment

  • VMware Identity Manager 3.3.7
  • vRealize Lifecycle Manager 8.x
  • NSX Load Balancer

Cause

In VMware Identity Manager (vIDM) 3.3.7, Elasticsearch was migrated to OpenSearch. By default, OpenSearch is only allocated 1,000 shards. If the cluster exceeds this limit due to audit records or search data, validation failures occur, preventing

Resolution

  1. Check Load Balancer Health: Ensure the vIDM pool is up. If necessary, temporarily change the health check to ICMP to restore connectivity.
  2. Take Snapshots: Ensure a valid backup/snapshot of each vIDM node exists before proceeding.
  3. Increase Shard Count: SSH into each vIDM node and run the following command to raise the limit to 6500:
    curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "6500" } }'
  4. Monitor Health: Check the status using:
    watch curl http://localhost:9200/_cluster/health?pretty=true

    Wait 5–10 minutes for re-allocation.

  5. Clear Unassigned Shards: If the status remains Red/Yellow with unassigned shards > 0 (on clustered deployments), run:
    curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -XDELETE "http://localhost:9200/{}"
  6. Restart Services:
    1. Force release database locks: /usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks
    2. Restart vIDM: service horizon-workspace restart

Additional Information

If you continue to see shard limit errors after increasing to 6500, you may need to increase the limit further to 8200.