When running the get cluster status command from the NSX CLI, the cluster status is reported as DEGRADED. Multiple management services, including SEARCH, APPLIANCE_PROXY, and SHA, show an UNKNOWN status on one or more nodes.
Investigation of /var/log/syslog reveals the following error signatures:
ShaMetricStatsServiceImpl ... Got exception when querying metric data, detail 404 Not Found .LmMetricRpcStub Onboard fails for APH [UUID].
NSX 9.0
The root cause is a race condition in the Service Health Agent (SHA) onboarding process.
This issue is resolved in VCF 9.1, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.