Troubleshooting VMware Aria Automation cloud proxies and On-Premises appliance deployments
search cancel

Troubleshooting VMware Aria Automation cloud proxies and On-Premises appliance deployments

book

Article ID: 326107

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Provide the most common errors during the deployment of Cloud Automation appliances and how to identify them.

Symptoms:

The following list of services and cloud proxies follow a similar methodology for troubleshooting:

  • VMware Aria Automation appliances (On-Premise)
  • VMware Aria Orchestrator appliances (On-Premise)
  • Cloud Proxy
  • VMware Aria Extensibility Proxy (formerly vRealize Automation Extensibility Proxy)
  • Cloud Extensibility Proxy
Most common misconfigurations:
  • NTP
  • DNS
  • Proxy initial deployed with shortname instead of FQDN.

These errors can be found in the following logs:

  • /var/log/bootstrap/firstboot.log
  • /var/log/bootstrap/everyboot.log

NTP issues

  • NTP error found in/var/log/bootstrap/firstboot.log
2022-09-xx 17:42:15Z /etc/bootstrap/firstboot.d/00-apply-ntp-servers.sh starting...
+ set -e
++ ovfenv -q --key ntp-servers
+ ovf_ntpServers=<IP_Address>
+ '[' '!' -z <IP_Address> ']'
+ /usr/local/bin/vracli ntp systemd --set <IP_Address> --local
Couldn't reach NTP server <IP_Address>: No response received from <IP_Address>.
No reachable NTP server found
2022-09-xx 17:42:21Z Script /etc/bootstrap/firstboot.d/00-apply-ntp-servers.sh failed, error status 1

DNS issues

  • DNS error in /var/log/bootstrap/firstboot.log

+ '[' '!' -e /etc/bootstrap/firstboot.d/02-setup-kubernetes ']'
+ '[' '!' -x /etc/bootstrap/firstboot.d/02-setup-kubernetes ']'
+ log '/etc/bootstrap/firstboot.d/02-setup-kubernetes starting...'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2022-06-xx 14:40:19 /etc/bootstrap/firstboot.d/02-setup-kubernetes starting...'
2022-06-xx 14:40:19 /etc/bootstrap/firstboot.d/02-setup-kubernetes starting...
+ /etc/bootstrap/firstboot.d/02-setup-kubernetes
+ export -f wait_health
+ timeout 300s bash -c wait_health
Running check eth0-ip

Running check non-default-hostname

Running check single-aptr
make: *** [/opt/health/Makefile:38: single-aptr] Error 1
make: Target 'firstboot' not remade because of errors.
Failed to get peer URLs
Running check eth0-ip

Shortname issues

  • Shortname error in /var/log/bootstrap/firstboot.log
+ kubeadm init phase preflight --config /tmp/kubeadm.config Failed to get peer URLs W0629 01:29:36.553417 4882 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /etc/resolv.conf [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Hostname]: hostname "<SHORTNAME>" could not be reached [WARNING Hostname]: hostname "<SHORTNAME>": lookup <FQDN>1 on <IP_Address>:53: server misbehaving [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
  • Shortname error in /var/log/bootstrap/everyboot.log
-- Logs begin at Fri 2022-08-19 18:09:16 UTC, end at Mon 2022-08-22 19:47:53 UTC. --
Aug 22 00:00:00 <SHORTNAME> kubelet[3429978]: E0822 00:00:00.076279 3429978 kubelet.go:2263] node "<SHORTNAME> " not found
Aug 22 00:00:00 <SHORTNAME>  kubelet[3429978]: E0822 00:00:00.177177 3429978 kubelet.go:2263] node "<SHORTNAME> " not found
Aug 22 00:00:00 <SHORTNAME>  kubelet[3429978]: E0822 00:00:00.277715 3429978 kubelet.go:2263] node "<SHORTNAME> " not found
Aug 22 00:00:00 <SHORTNAME>  kubelet[3429978]: E0822 00:00:00.285714 3429978 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"<SHORTNAME> ", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"<SHORTNAME> ", UID:"<SHORTNAME>", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"<SHORTNAME> "}, FirstTimestamp:time.Date(2022, time.August, 21, 23, 58, 32, 375534888, time.Local), LastTimestamp:time.Date(2022, time.August, 21, 23, 58, 32, 375534888, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://vra-k8s.local:6443/api/v1/namespaces/default/events": dial tcp <SHORTNAME> :6443: connect: connection refused'(may retry after sleeping)


Environment

VMware vRealize Automation 8.x
VMware vRealize Orchestrator 8.x

Cause

Prerequisite infrastructure services may be misconfigured or encountering an issue.

Note: This includes the configuration of the proxy appliances deployment as it relates to these infrastructure services. Be sure to check both external configurations and appliance configurations.

Resolution

NTP, DNS, and Shortname sections should be reviewed and are suggested for all proxy variations:

  • Cloud Automation Appliances for VMware Aria Automation
  • VMware Aria Orchestrator
  • Cloud Proxy
  • VMware Aria Extensibility Proxy (formerly vRealize Automation Extensibility Proxy)
  • Cloud Extensibility Proxy

Step 1: NTP issues

  1. Validate the Server IP and hostname
  2. Validate connectivity to the NTP server.
  3. Validate the port is opened.
  4. Delete the VMs of the failed attempt and retry the deployment.

Step 2: DNS issues

  1. Validate that the DNS server is reachable and that the port is opened.
  2. The DNS record must use Fully Qualified Domain Names (FQDNs), no shortname.
  3. A Single A record and a Single PTR record is required. CNAMEs are not supported (only Multitenancy supports CNAME records, for more information refer to this document Set up multi-organization tenancy for VMware Aria Automation
  4. Retry validation of the forward and reverse lookup using the nslookup FQDN and nslookup IPaddress command.

Validating lookup for Name resolution

Note: Fully Qualified Domain Names are required. Do not use shortnames. There should be a single A record for each appliance and VIP.

root@appliance [ ~ ]# nslookup <vra.example.com>
Server:         192.168.xx.xx
Address:        192.168.xx.xx#53

Name:   <vra.example.com>
Address: 192.168.20.xxx

Validating the reserve lookup

Note: There should be a single PTR record, CNAMEs are not supported (with the exception of Multitenant environments), and if the record is duplicated it causes issues. 

root@appliance [ ~ ]# nslookup 192.168.20.xxx
xxx.20.168.192.in-addr.arpa     name = <vra.example.com>
  1. VMware Aria Automation/VMware Aria Automation Orchestrator 8.7 and later use the dig command instead of host in order to validate the DNS service in the script /opt/health/Makefile "single-aptr"

Version 8.6.2

single-aptr: eth0-ip 
$(begin_check) 
echo Check the ip address if eth0 resolves only to a single hostname 
[ 1 -eq $$( host $$( host $$( iface-ip eth0 ) | wc -l ) ] $(end_check)

Version 8.7 onwards

single-aptr: eth0-ip
 $(begin_check) 
echo Check the ip address if eth0 resolves only to a single hostname [ 1 -eq $$(/usr/bin/dig +noall +answer -x $$( iface-ip eth0 ) | grep "PTR" | wc - l ) ] 
$(end_check)
    Therefore, for these versions it is recommended to run the following commands:
    /usr/bin/dig +noall +answer +nocookie -x $( iface-ip eth0 )
    /usr/bin/dig +noall +answer +noedns -x $( iface-ip eth0 )
    /usr/bin/dig +noall +answer -x $( iface-ip eth0 )

    Scenarios:

    1. After fixing any DNS issues, delete the VMs of the failed attempt, and retry the deployment. 

    Step 3: Shortname issues

    1. Delete the VMs of the failed attempt and retry the deployment using FQDNs. If a shortname was used within the DNS record configuration, update the DNS records before retrying the deployment.

    Step 4: Additional validations for Cloud Extensibility Proxy and Cloud Proxy

    Once Steps 1-3 through are complete, consider the following:

    • OTK expires in 24 hours.
    • OTK cannot be reused for several proxies.
    • Validate there is internet connectivity.
    • The OVA must be deployed to a vCenter. Deployment directly to an ESXi server is NOT supported.
    • For the cloud proxy, a network proxy that performs TLS terminations is NOT supported.
    • Run the following command to validate there is connectivity to the required URLs:
    sh /data-collector-status –-traceroute
    Note: The URL required can change based in proxy location as explained in:

    Step 5: Additional validation for VMware Aria Automation Orchestrator (formerly known as VMware vRealize Orchestrator)

    1. Please check the following article: After deploying VMware Aria Automation Orchestrator (formerly known vRealize Orchestrator) the UI spins and never loads.

    Step 6: Network Load Balancer

    1. For cluster (3 nodes) deployments deploy a Load Balancer: VMware Aria Automation Load Balancing Guide. This is a strict requirement, except for the following products:
    • VMware Aria Automation with VMware Aria Lifecycle in VCF mode.
    • VMware Aria Operations with VMware Aria Lifecycle in VCF mode.
    • VMware Identity Manager with VMware Aria Lifecycle in VCF mode.

    Note: VMware Aria Automation Orchestrator (formerly vRealize Orchestrator) will require manual creation of the load balancer.


    Additional Information