Task container creation failure on TAS with NSX-T
search cancel

Task container creation failure on TAS with NSX-T

book

Article ID: 297484

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

 

In the event that the creation of a task container encounters a failure on Tanzu Application Service with NSX-NCP as a Container Network Interface (CNI), the retry mechanism may encounter an issue preventing the creation of the task container. The error messages associated with this problem can be seen on Diego Database logs bbs/bbs.stdout.log

"rejection-reason": "failed to create container: external networker up: exit status 1"

APP/Task logs

2023-xx-0xT18:29:04.695+08:00 [CELL/0] [ERR] Cell xxx-xx-xx-xxx-xx failed to create container for instance xxxxx-xxxx-xxxx: external networker up: exit status 1


And On Diego cell logs nsx-node-agent/nsx-node-agent.stdout.log

Unable to retrieve network info for container xxxxx, network interface for it will not be configured


Following the initial failure of the task container creation, if the "max_task_retry" parameter is set to a value greater than 0(default is 3), the system attempts to recreate the task on a different Diego Cell. NSX Container Plugin (NCP) does not have a retry mechanism thus does not make an attempt to create a port for the new cell ID. Consequently, if the new Diego Cell is located on a different ESXi, the hyperbus lacks a logical port for the task/container, resulting in the "external networker up" error.


Resolution

To address this issue, you can implement the following workaround: Set the max_task_retries parameter to 0 in the file /var/vcap/jobs/bbs/config/bbs.json. By default, this parameter is set to 3. Ensure that you apply this change to all Diego Database instances. It's important to note that this modification is not persistent through VM recreation.

However, please be aware that this workaround specifically targets the retry mechanism and does not resolve the underlying problem associated with the initial failure in task container creation.

NSX-NCP v4.1.2 provides a fix for the TAS feature task_max_retry hence the value can be set again to 3(default).