Telco Cloud Automation 2.x - HA-based cloud native TCA/TCA-CP appliance deployment Day-0 using scripts

Products

VMware VMware Telco Cloud Automation

Issue/Introduction

This Script helps to Create/Delete cloud native TCA 2.x and/or TCA-CP 2.x appliance in HA mode.

Prerequisite

- Install Bootstrapper VM on a vC (and optionally vRLI) in a regular or airgap scenario using VMware-Telco-Cloud-Automation-<version>.ova and selecting appliance role as bootstrapper.

- Make sure to have latest photon VM Template uploaded on your vC for consumption by the script to create management and workload clusters. Example: photon-3-kube-v1.21.2+vmware.1 for Tanzu 1.4.0

- Create bootstrapper.json file using bootstrapper_template.json on the Bootstrapper VM by updating following sections:

[ Location on script and json template on Bootstrapper VM: /opt/vmware/setup_ha/bootstrapper_template.json ]

Note: Sample of bootstrapper_template.json attached to kb

Sizing/System Requirements

Cloud Native TCA and TCA-CP on same cluster (different namespaces - dev only)

- Deployment of both TCA and TCA-CP in same cluster (different namespaces) will need at least 3 worker VMs/nodes in management cluster with following config each: CPU=8 and MEM=16384MB

APPLIANCE → namespace

TCA → tca-mgr

TCA-CP → tca-system

One appliance TCA or TCA-CP in a cluster

Only TCA or TCA-CP installation can work with 3 worker VMs/ node in management cluster with config of CPU=4 and MEM=8192MB each.

Note: Detailed information on sizing/system requirements engage VMware GSS Team

Environment

VMware Telco Cloud Automation 2.0

Resolution

The Bootstrapper virtual machine contains scripts and required templates for setting up a cloud native VMware Telco Cloud Automation appliance in HA mode.

The following table lists their file locations.

Setup Script Path : /opt/vmware/setup_ha/setup_ha.py
Configuration File Template : /opt/vmware/setup_ha/bootstrapper_template.json
File Location in : Bootstrapper

$ python3 /opt/vmware/setup_ha/setup_ha

.py  --config_file ~/bootstrapper.json --skipConfirmations

Note: 

- For detailed steps and usage on setup_ha.py please refer the attached file - setup_ha_usage.txt
- For detailed information on example stages script executed in detailed in the attached sample- script-stage.txt

A. Troubleshooting in case of Failed Cluster creation :

The cluster creation can fail based on several reasons:

- etcdserver leader change
- etcdserver timeout
- Unable to connect to Bootstrapper Service issue
- Cluster creation timeout issues

Note: Testbed issues (As observed):

Internet connectivity problem (DNS configuration is incorrect in the bootstrapper VM), due to which the images for the containers cannot be downloaded from the internet
For airgap configuration - not able to connect to the airgap server and the airgap server cannot reach out to the cluster nodes
First delete the cluster, then proceed fix the testbed issues if there are any and then re-trigger the cluster creation.

Workaround:

Delete Cluster and rerun the script to try again a recreate of the cluster.

NOTE:

1. If Bootstrapper cluster is already created, then before MANAGEMENT cluster, we should always delete the Bootstrapper cluster first.
2. To Delete TCA Cluster, use clusterType=MANAGEMENT
3. To Delete Bootstrapper Cluster, use clusterType=WORKLOAD

Steps to Delete a TCA cluster:

Delete the cluster:

use the GET API of the appliance manager to fetch the cluster details
Use the id from the API and pass it to the DELETE API
Use the status API to monitor the delete cluster
Use the force delete option to remove from the local DB to reuse the controlPlaneEndpointIp

B. Troubleshooting in case of Failed Service installation:

The Service installation can fail based on several reasons:

Few known issues as represented and seen in Infrastructure Automation logs and UI extracts:

Helm API failed: release: <service_name> not found
Helm API failed: release name <service_name> in namespace <namespace_name> is already in use
Release "service_name" failed: etcdserver: leader changed
Failed to deploy <service_name>-readiness

Workaround:

Uninstall the failed service manually.

Then retry the script to 'deploy specific service'.

Use 'deploy specific service' option (–deploy) for setup_ha.py to retry installing the service. Script always ensure to avoid installing service if it is already installed in a related namespaces.

e.g. to install mongodb command reference:

/opt/vmware/setup_ha/setup_ha.py --deploy mongodb

Attachments

script-stage get_app

setup_ha_usage get_app

Sample-bootstrapper_template_json get_app