We patched the RHEL OS on our CAPM stack today and found that it was in "System Health::failed" for an extended period of time. During this time, the DA showed as "Unable to Connect" and the Collectors link was gone under the Administration menu. We noticed that when the DA did showed as available that the collectors were red and multiple syncs failed. We decided to after restarting the whole stack to bring each component up individually, but still the stack did not stabilize in a timely manner. We are creating this ticket to have the CAPM stack restart process investigated and to determine if this was out of the norm or if it can be shortened to minimize downtime.
One node left the cluster during start-up and this slowed things down considerably.
Restarting services fixed this problem, but likely allowing more time for the startup to finish would have solved this as well.