Error: "Failed to Start" in UI After Placing Primary Cell in Maintenance Mode

search cancel

Error: "Failed to Start" in UI After Placing Primary Cell in Maintenance Mode

book

Article ID: 402885

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Placing the primary VMware Cloud Director cell into Maintenance Mode causes the UI and API to become unresponsive at the public FQDN.
The environment appears unavailable until the primary cell is restarted, despite other cells being healthy.

Environment

VMware Cloud Director 10.6.1

Cause

This issue can occur when the Load Balancer health check does not accurately detect that a VMware Cloud Director cell is in Maintenance Mode. Endpoints such as /api/versions may still return HTTP 200 responses, even though the Cloud Director service is no longer serving traffic. As a result, the Load Balancer continues routing requests to a cell that is not operational, triggering “Failed to Start” errors in the UI.

Resolution

Review Load Balancer Backend Configuration
Identify the backend pool configuration for VMware Cloud Director in your Load Balancer (e.g., HAProxy, NSX-ALB).
Update Health Check to Detect Maintenance Mode
Modify the health check for each backend VCD node to use the /cloud endpoint:

option httpchk GET /cloud
http-check expect status 200
The /cloud endpoint is a lightweight VCD-native check.
It returns 200 OK when the Cloud Director service is running and accepting UI/API requests.
It returns 503 Service Unavailable when the cell is in Maintenance Mode or the service is stopped.
This allows the Load Balancer to detect that the node is unavailable and remove it from the active pool automatically.
Apply the Configuration
Reload or restart your Load Balancer to apply the changes.

For example (on HAProxy):

systemctl reload haproxy
Validate Health Check Behavior
While primary node is in Maintenance Mode, run the following command from any test system:

curl -k https://<public-fqdn>/cloud
Expected behavior:

If traffic is being routed to a healthy node:
HTTP/1.1 200 OK

If traffic is still being routed to the node in Maintenance Mode:
HTTP/1.1 503 Service Unavailable
Confirm UI Availability
Open the VCD portal via the public FQDN.

Ensure the login UI loads without the “Failed to Start” error.

Additional Information

This issue is reproducible only when the Load Balancer continues to route to a cell whose vCD service has been intentionally paused (e.g., via Maintenance Mode).

Feedback

thumb_up Yes

thumb_down No