vCenter HA (VCHA) may perform repeated failover operations every 5 to 10 minutes
search cancel

vCenter HA (VCHA) may perform repeated failover operations every 5 to 10 minutes

book

Article ID: 323226

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • vCenter HA may failover repeatedly every 5 to 10 minutes.
  • vCenter HA pair fails back and forth with pschealth exited unexpectedly messages similar to:

    17-05-24T12:17:46.009929-04:00 warning vmon Service pschealth exited. Exit code 1
    17-05-24T12:17:46.010162-04:00 warning vmon Service pschealth exited unexpectedly. Crash count 0. Taking configured recovery action.
     
  • In the /var/log/messages, you see entries similar to:

    2017-05-24T13:26:23.758145-04:00 HST-EXP-VC01 su[33644]: pam_unix(su:session): session closed for user vpostgres
    2017-05-24T13:25:16.086607-04:00 HST-EXP-VC01 ntpd[1244]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    2017-05-24T13:25:16.087228-04:00 HST-EXP-VC01 ntpd[1244]: frequency error -119309 PPM exceeds tolerance 500 PPM
    2017-05-24T13:25:16.087401-04:00 HST-EXP-VC01 systemd[1]: Time has been changed

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment



Environment

VMware vCenter Server 6.7.x
VMware vCenter Server 6.5.x

Cause

This issue occurs when the ESXi host on which the active, passive and witness nodes reside are out of time sync. The Photon OS for vCenter Server will sync with the time on the ESXi host periodically, and if NTP is not configured correctly, it can cause the time inside the Guest OS to drift. If the time drifts too much, this can trigger a failover between the Active/Passive VCHA appliances.

Resolution

By design, Active, passive and witness nodes reside on three different ESXi hosts.

To resolve this issue, ensure that all the three ESXi hosts that manage the Active, Passive and Witness nodes of VCHA have NTP configured correctly. Additionally, ensure the ESXi hosts have the NTP service running, and have a consistent NTP server defined.

For more information on ESX/ESXi timekeeping best practices, refer to KB 2004453