DU Instantiation Fails with Internal Server Error: unable to validate K8s infra requirements.
search cancel

DU Instantiation Fails with Internal Server Error: unable to validate K8s infra requirements.

book

Article ID: 325409

calendar_today

Updated On:

Products

VMware VMware Telco Cloud Automation

Issue/Introduction

The purpose of this workaround is to allow the Kubernetes inventory service to catch up with the rate at which resource changes are processed preventing any DU instantiation issues.

Symptoms:
DU instantiation fails with an Internal Server Error stating that it is unable to validate K8s infra requirements. This can happen when a node pool is deleted and recreated with the same name, the deleted node may still show in the Telco Cloud Automation (TCA) User Interface.

Environment

VMware Telco Cloud Automation 2.0.1
VMware Telco Cloud Automation 2.1
VMware Telco Cloud Automation 2.0

Cause

The resource-change-monitor component of TCA uses a cache to hold resource changes until the Kubernetes inventory fetches any resource changes.  This cache maintains, among other keys and values, the latest resource changes, and the timestamps of these changes. If there is a resource change and the resource is already in the cache, the value will be replaced with the new resource change and the time it was last updated.

If there is a new resource change and it does not exist in the cache, the first check is to confirm if the cache is full. If the cache is full, the entry with the oldest update time will be deleted and the new entry will be inserted to the cache. The Kubernetes inventory service fetches resources changes every 4 minutes.

In TCA 2.1, several bug fixes and improvements to the resource-change-monitor have allowed this component to be more efficient and process events at a higher rate than with previous TCA versions.  The increased efficiency of the resource-change-monitor allows it to process close to 1000 resource changes per minute which may be more than the rate at which the Kubernetes inventory service fetches resource changes.

Resolution

To be addressed in Telco Cloud Automation 2.1.1.


Workaround:
The workaround is to increase the resource-change-monitor cache size to 4500 since the Kubernetes inventory fetches cache contents every 4 minutes and the resource-change-monitor processes resource changes at a rate of 1000 resource changes per minute.
 
4500 allows for 4 minutes of resource changes plus an additional 500 to account for any delays in job execution.
 
Procedure
1. SSH into the TCA-CP appliance as admin. Once logged in, switch user to root.
2. Locate the resource-change-monitor.env configuration file on the TCA appliance located in the /etc/hybridity/ directory.
3. Update the CACHE_SIZE value from the default 1024 to 4500.
vi /etc/hybridity/resource-change-monitor.env
# address used in the Restful API service (default "127.0.0.1")
# ADDRESS=127.0.0.1
# cache size - maximum events being cached (default 1024)
CACHE_SIZE=4500
4. Restart the resource-change-monitor as well as the app-engine:
systemctl restart resource-change-monitor
systemctl restart app-engine


Additional Information

Impact/Risks:
Impacts Telco Cloud Automation 2.0, 2.0.2, and 2.1.