User unable to login to GUI of VCF Operations for Networks

Products

VCF Operations for Networks

Issue/Introduction

User unable to login to VCF Operations for Networks

Refer to below screenshot:
This is a 3 Node Clustered deployment in VCF Operations for Networks.
VCF Operations for Networks was failover from one availability zone to another availability zone when this issue is seen.
Multiple services are unhealthy and not running when service health check script is executed.

VCF Operations for Networks database shows coordination servers not communicating

ubuntu@platform1 :~ $ fdbcli -- exec "status details"
Using cluster file '/etc/foundationdb/fdb.cluster'.

Could not communicate with a quorum of coordination servers:
##.#.##.##:4500
##.#.##.##:4501
##.#.##.##::4500
ubuntu@platform1 :~ $

Environment

VCF Operations for Networks 9.0.1

Cause

VCF Operations for Networks Cluster appliances were powered off via VCF Operations >Fleet management>Lifecycle in an incorrect order resulting to this issue.

Resolution

In order to fix this issue perform below steps:

Access platform 1 node via a putty/SSH , login with username support
Type ub to enter to ubuntu mode and execute service health check command to identify the services state. See commands below:
```
ub
./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d 
```
Identify the unhealthy services and notice that few service are not listed.
Services not showing up since these got masked.

Execute below commands to unmask the foundation dB services, check status and start the service.

 ./run_all.sh sudo systemctl unmask fdb.service
 ./run_all.sh sudo systemctl status fdb.service
 ./run_all.sh sudo systemctl start fdb.service

Execute below commands to unmask the flink services, check status and start the service.

 ./run_all.sh sudo systemctl unmask elasticsearch
 ./run_all.sh sudo systemctl status elasticsearch
 ./run_all.sh sudo systemctl start elasticsearch

Wait for 10-15 minutes for entire cluster services to be healthy and then execute service health check command to identify the services state. See commands below:
```
ub
./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d 
```
All Services should be come up running and healthy.
At this time GUI logging should work as expected.
All errors seen on platform nodes should be cleared by now.
If Collectors were also powered off via VCF Operation/Lifecycle manager then power on the collector node from vCenter.
Post powering on the collector on the GUI it will show a message "No data being received"

Refer to screenshot below:
Wait for 10-15 minutes the message "No data being received" should disappear from UI.

Refer to screenshot as below:
On the GUI you would also see high processing lag, this lag should settle down gradually after 1-2 hours and GUI should shows processing lags in green with system health in green.

Refer to screenshot as below: