Newly deployed VCF Operations Orchestrator/Aria Automation Orchestrator appliances are experiencing pod startup failures.

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Kubernetes Prelude namespace pods fail to start post new deployment of Aria Orchestrator appliance.

After deploying a new VCF Operations Orchestrator/Aria Automation Orchestrator appliance, when running command kubectl get pods -n prelude in CLI on appliance, the following response can be seen:
root@ [ /var/log/vmware/prelude ]# kubectl get pods -n prelude
Config not found: /etc/kubernetes/admin.conf
The connection to the server localhost:8080 was refused - did you specify the right host or port?
When searching through the vracli-service-status.log the following error's can be seen:
root@ [ /var/log/vmware/prelude ]# tail -f vracli-service-status.log

YYYY-MM-DD HH:MM:SS service - INFO - Service kube-dns. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service etcd-service. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service kube-apiserver. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service kube-flannel-ds. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service health-reporting-service. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service kube-controller-manager. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service kube-proxy. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service kube-scheduler. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service predictable-pod-scheduler. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service - INFO - Service kubelet-rubber-stamp. There is no information about nodes.
YYYY-MM-DD HH:MM:SS service.kube - ERROR - Error get status for all kubernetes items:  Config not found:  /etc/kubernetes/admin.conf\nThe connection to the server localhost:8080 was refused - did you specify the right host or port?\n'
YYYY-MM-DD HH:MM:SS service.kube - ERROR - Error get status for all pods in all namespaces:  Config not found:  /etc/kubernetes/admin.conf\nThe connection to the server localhost:8080 was refused - did you specify the right host or port?\n'

When attempting to access the control center of Aria Automation Orchestrator via https://<vrofqdn>/vco, it fails to load with the following image:
Note: There is no Control center for the Aria Automation Orchestrator and VCF Operations Orchestrator as it has been deprecated from Aria Automation Orchestrator 8.18.1 reference Unable to access Aria Automation Orchestrator Control Center to setup the authentication provider.

Environment

Aria Automation Orchestrator 8.x
VCF Operations Orchestrator 9.0

Cause

This is a known issue with following OVFs where setting NTP while deploying or before firstboot causes the docker pods to not create.

Affected VCF Operations Orchestrator/Aria Automation Orchestrator appliance OVFs:

O11N_VA-8.12.0.30728-21620161_OVF10.ova
O11N_VA-8.16.2.34719-23466433_OVF10.ova
O11N_VA-8.17.0.35210-23787547_OVF10.ova
O11N_VA-8.18.0.35770-24024334_OVF10.ova
O11N_VA-9.0.0.024674408.ova

This may also be observed when Aria Automation Orchestrator / VCF Orchestrator Appliance IP address resolves to multiple FDQN's.

Resolution

This known issue is fixed in Aria Automation 8.18.1 OVA O11N_VA-8.18.1.36791-24281602_OVF10.ova
And, for VCF Operations Orchestrator it is fixed in O11N_VA-9.0.1.0.24923009.ova

Workaround:

Scenario 1:

NOTE: Do not assign an NTP server if deploying 8.10.x + as it causes the kube pods to not create successfully on firstboot. Add the NTP servers in the OVA properties after the initial docker containers are created, or run this command from the CLI after the initial firstboot after the docker pods are created (also detailed further down in doc during appliance configuration process):

'vracli ntp systemd --set <IP_Addr_1>,<IP_Addr_2>' after the firstboot completes

You can tail this log file after log file of initial power on to see if you are running into this -- /var/log/bootstrap/firstboot.log

Scenario 2:

The Aria Automation Orchestrator IP address must resolve to a unique FQDN.
To verify the DNS lookup, execute the following command:

nslookup <Aria Automation Orchestrator IP address>
nslookup <Aria Automation Orchestrator  FQDN>

Refer the Network Requirements for Automation Orchestrator on the below document to get more detials.

Automation Orchestrator system requirements