Fix: Cronjob failure in TCA 3.1.1 airgap environment

Products

VMware Telco Cloud Automation

Issue/Introduction

Cronjob cleanup pods named "tca-pod-log-cleanup" & "tca-retained-log-cleanup" fail and shows status as "Error".

admin [ ~ ]$ kubectl get pods -A -n tcacp | grep log
tca-cp-cn   tca-pod-log-cleanup-xxxxxx        0/1     Error    0     113m
tca-cp-cn   tca-retained-log-cleanup-xxxxxxxx 0/1     Error    0     7h53m

Directory /logs/retained-logs/ gets filled.

The pod logs of cronjob will look like these:

// admin [ ~ ]$ kubectl logs tca-pod-log-cleanup-28745640-wcfln -n tca-cp-cn
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: PyYAML>=5.4.1 in /usr/lib/python3.11/site-packages (from -r /scripts/requirements.txt (line 1)) (6.0.1)
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f58990ec4d0>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/kubernetes/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5897502e90>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/kubernetes/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5897503990>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/kubernetes/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5897510450>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/kubernetes/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f5897510e50>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/kubernetes/
ERROR: Could not find a version that satisfies the requirement kubernetes>=26.1.0 (from versions: none)

Environment

VMware Telco Cloud Automation 3.1.1 GA (build ob-23912982)

Resolution

Note: This procedure is recommended for a permanent fix.

Customer needs to apply a patch to VMware TCA to ensure that they do not encounter this problem in their environment.
This patch changes the behaviour for VMware TCA in the following manner:

Note: Patch 3.1.1.02(ob-24258124) is built on top of patch 3.1.1.01(ob-24136019) , so this patch will contain those fixes as well as mentioned in below table.

TCA 3.1.1 GA (without patch)	TCA 3.1.1.02 (with patch)
Cronjob to cleanup retained log & pod logs will fail in airgap environment	Cleanup of retained logs will happen daily & pod log cleanup will happen every 2 hours
AD configuration happens with Periodic Sync for AD objects	AD configuration happens with no periodic sync (already fixed in 3.1.1.01)
Users are synced to TCA	Users are not synced to TCA. They are queried on demand (already fixed in 3.1.1.01)
Groups are synced to TCA	Groups are not synced to TCA. They are queried on demand (already fixed in 3.1.1.01)
Validation for ensuring that the Admin User Group exists	No such validation exists. It is the responsibility of the user to ensure that the Admin User Group exists (already fixed in 3.1.1.01)

How to apply the patch

Note: Consider taking a backup / snapshot of the TCA VM, this will help in reverting to the previous state if any issues are encountered while performing the patching operation.

Services that will be upgraded via the patch are:

tca-api (web-engine)
tca-platform-manager (appliance management)

The patch needs to be applied on both TCA-Manager and TCA-CP appliances

Please follow the following steps to apply the patch for upgrading the above 2 services:

Download the patch tar file attached at the bottom section of this KB. File name is patch-changes.tar
SSH into the TCA appliance (TCA-M and TCA-CP) and switch the user to root.
Copy the patch-changes.tar patch bundle to the /tmp directory of the TCA appliance.
Extract the patch-changes.tar file:
```
tar -xvf patch-changes.tar 
```
Change to the patch-changes folder:
```
cd patch-changes
```
Execute the patch-tca.sh patch file. Please ensure there are no CaaS / CNF LCM operations in progress before running the script.
```
./patch-tca.sh
```
Monitor the patch status for completion. One can also review the contents of the tca-patch.log within the same directory from which the script was run.

Run command "watch kubectl get tcxproduct" and wait until all READY status are True (both TCA-M & TCA-CP).

# watch kubectl get tcxproduct
                                                                                                                                           
NAME         STATUS            READY   MESSAGE                               AGE
tca-common   updateCompleted   True    All App CRs reconciled successfully   1h
tca-cp-cn    updateCompleted   True    All App CRs reconciled successfully   1h

Validate if tca-api and tca-platform-manager pods are up and running after the patch script is run.

# Commands for querying pods within TCA-M
$ kubectl get pods -n tca-mgr | grep tca-api
tca-api-9cd796ddb-dsszs                              1/1     Running     0             34m
 
$ kubectl get pods -n tca-mgr | grep tca-platform
tca-platform-manager-96bcf4c9d-n4p7n                 1/1     Running     0             34m
 
# Commands for querying pods within TCA-CP
$ kubectl get pods -n tca-cp-cn | grep tca-api
tca-api-5c6ff96f6d-glpcc                             1/1     Running     0             38m
 
$ kubectl get pods -n tca-cp-cn | grep tca-platform
tca-platform-manager-b458f8b47-b98ml                 1/1     Running     0             38m

Notes:

How to apply the patch if TCA is in an airgaped environment?
Same steps as above.
Given a TCA with multiple TCA-CPs, do you need to apply the patch to each TCA-CP?
Yes, the patch needs to be applied to each TCA-CP.

Attachments

patch-changes.tar get_app