Load balancing requirements for VMware vRealize Automation 6.x-7.x

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

This article provides baseline requirements for load balancing in vRealize Automation (formerly known as VMware vCloud Automation Center).

Environment

VMware vRealize Automation 7.x
VMware vRealize Automation 6.x

Resolution

These are the baseline requirements to ensure that the Virtual Appliance (VA), IaaS Web and Manager component servers will function properly when configuring a network load balancer (NLB) for vRealize Automation 7.x

Persistent state (sticky sessions) must be configured on the NLB or you must use a session state database as described in the vRealize Automation 7.x Load Balancing.
All vRealize Automation configurations must point to the NLB for repository / manager service access. VMware recommends not mixing and matching or pointing directly to load balanced server names.
Certificates must be trusted when accessing https://LoadBalancer/Repository/ and https://LoadBalancer/VMPS2 (if the manager service is behind the NLB) from all vRealize Automation servers / service accounts.
Microsoft Loopback protection must be disabled on the servers or exceptions for loopback must be entered for the vRealize Automation server FQDNs.

DNS Redirection / DNS Load Balancing:

DNS redirection is not a supported form of NLB for vRealize Automation 7.x components.
- A DNS alias can be used to initially install vRealize Automation HA environments if all VIP DNS A records and host files resolve / point to the leading nodes ONLY.
  - This should be only done in situations in which a supported network load balancer is expected to be available briefly after installation as VMware cannot guarantee product stability or functionality.
    - The below is an example of product functionality breakdown overtime when a Manager Service is set to Automatic failover, but the alias is pointed to the first node:
      - If using DNS CNAME Alias' in vRealize Automation 7.3 and above in a clustered environment, the automatic fail-over functionality within the Manager service component can lead to 503 errors when the service swaps to the secondary node that is not pointed to the correct DNS alias.
        
        Symptoms include Virtual appliance management interface (VAMI) reports a "FAILED" status when reviewing the Services tab for "IaaS-Server"
        
        "Ping Failure: There was no endpoint listening at https://managementvip/VMPS2Proxy that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerExcpetion, if present, for more details. Inner Exception: Unable to connect to the remote server."
        
        Provisioning appears halted. Cloning events never fire in managed vSphere endpoints
Keep in mind, the automatic fail-over functionality will be limited which will require manual intervention until a NLB is setup and configured to be used with the virtual IP (VIP) defined for appliance, manager, and web VIP addresses.

Workaround:

Common health check URLs used to troubleshoot and validate load balancing configurations:

IaaS Web: https://FQDN-IaaS-Web/WAPI/api/status/web
Repository: https://FQDN-IaaS-Web/Repository/Data/MetaModel.svc
IaaS Manager: https://FQDN-IaaS-Manager/VMPSProvision