Error: “Pre Interface Script execution failed" while instantiating a CNF

Products

VMware Telco Cloud Automation

Issue/Introduction

Telco Cloud Automation (TCA) versions 2.3 and earlier stored the kubeconfig of the management cluster both on the TCA-CP filesystem and in the database. This kubeconfig is used for Lifecycle Management (LCM) of the workload clusters, as well as for network function instantiation on top of the workload clusters. If the kubeconfig stored in TCA-CP expires, you may see the following symptoms:

The CNF LCM operations, such as instantiation or upgrade, will fail with the error message "Pre-interface script execution failed." Additionally,
Any cluster-level LCM operations, like Add-ons or cluster edits for workload clusters hosted on top of management, will fail with an "HTTP status 401 Unauthorized" error.

In TCA-CP app-engine logs, below errors can be seen:

[ClusterAutomationService_SvcThread-xxx, Ent: HybridityAdmin, Usr: xyz@abc , TxId: ####-####-####-####-###############] WARN  CaaS.Flow- error quering flow status
io.kubernetes.client.openapi.ApiException:
    	at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:973)
    	at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:885)

Based on the error above, the app-engine is encountering an issue while trying to access the management cluster using the k8s-bootstrapperd service on TCA-CP.

The Bootstapper logs, located at /comon/logs/k8s-bootstrapper/bootstrapperd.log, will contain the following:

Mar 21 13:09:23 apiserverd[14919] : [Warning-controller] : Failed to reach to management cluster [#######-####-####-####-##########], err: Unauthorized

Mar 21 13:09:23 apiserverd[14919] : [Warning-controller] : Failed to reach to management cluster [#######-####-####-####-##########], err: Unauthorized

Mar 21 13:09:23 apiserverd[14919] : [Warning-controller] : Failed to reach to management cluster [#######-####-####-####-##########], err: Unauthorized

Environment

2.3 or lower versions

Cause

The management cluster kubeconfig is usually valid for one year and can be renewed either automatically or manually by the user. Once renewed, the kubeconfig needs to be updated both in the file system and in the database. The issue in 2.3 or earlier release is the automated poller only updates the kubeconfig into the database but not on the file system. When the kubeconfig on the file system is out of sync with the endpoint, users will encounter the symptoms mentioned above.

This issue is because the kubeconfig that is being used to access management cluster that is (located at /opt/vmware/k8s-bootstrapper/<mgmt-cluster-id>/kubeconfig) has expired.

Resolution

The issue is resolved in 3.x and later versions.

The below workaround can be applied to TCA 2.3 or lower versions.

Find the management cluster-id execute the following command
```
# kbsctl show managementclusters

Count: 1

----------------------------------------

ID: 0######a-b7a2-####-####-a###########22

Name: tca1-mgmt-cluster1234

Status: unknown

TKG ID: #######-1##d-####-####-3###########a
```
The management cluster ID in this case is 0######a-b7a2-####-####-a###########22.
The kubeconfig is located at /opt/vmware/k8s-bootstrapper/0######a-b7a2-####-####-a###########22/kubeconfig.
Update this kubeconfig, by copying the admin.conf from the cluster endpoint.
-SSH into the management cluster using either the VIP or IP of the control-plane node,
-Copy /etc/kubernetes/admin.conf to /opt/vmware/k8s-bootstrapper/0######a-b7a2-####-####-a###########22/kubeconfig.
Make sure to replace the content of the existing kubeconfig with the contents of admin.conf.

After replacing the kubeconfig, rerun kbsctl show managementclusters

kbsctl show managementclusters

Count: 1

----------------------------------------

ID: 0######a-b7a2-####-####-a###########22

Name: tca1-mgmt-cluster1234

Status: Running

TKG ID: #######-1##d-####-####-3###########a

Management cluster status should come back to Running

After this in order to make sure all services uses this config

Restart the app-engine on TCA-CP

# systemctl restart app-engine

After this, NF instantiation should work properly, and any LCM operations on the associated workload cluster should succeed without errors.