Deploying application via CF CLI fails with error "external networker up: exit status 1" and "Failed to setup veth pair for container" in NSX Node Agent logs
search cancel

Deploying application via CF CLI fails with error "external networker up: exit status 1" and "Failed to setup veth pair for container" in NSX Node Agent logs

book

Article ID: 382040

calendar_today

Updated On:

Products

VMware Tanzu Application Service VMware NSX

Issue/Introduction

Prerequisites / Assumptions:

  • TAS tile is being used in this environment
  • NSX tile is being used in this environment
  • NSX-T is being used in this environment

This KB article goes over the scenario in which one encounters a "external networker up: exit status 1" when attempting to push an application via the CF CLI along with seeing a "Failed to setup veth part for container" error in the NSX Node Agent logs.

We may encounter an error that looks like this in the TAS application logs (can be grabbed via the command cf logs <App Name> --recent): 

 

2024-10-04T07:44:35.791-04:00 [CELL/0] [ERR] Cell xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx failed to create container for instance xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: external networker up: exit status 1

2024-10-04T07:44:35.853-04:00 [CELL/0] [OUT] Cell xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx destroying container for instance xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2024-10-04T07:44:35.858-04:00 [CELL/0] [OUT] Cell xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx successfully destroyed container for instance xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2024-10-04T07:44:35.873-04:00 [API/1] [OUT] Process has crashed with type: "web"

2024-10-04T07:44:35.910-04:00 [API/1] [OUT] App instance exited with guid xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx payload: {"instance"=>"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "index"=>0, "cell_id"=>"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "reason"=>"CRASHED", "exit_description"=>"failed to create container: external networker up: exit status 1", "crash_count"=>40, "crash_timestamp"=>1728042275835530222, "version"=>"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}

 

 
Additionally if we check the NSX-Node-Agent logs (nsx-node-agent.stdout.log) located in the Diego Cell log bundle and correlate the timestamps of the app logs above, we may see something similar to the error below:
 

2024-10-04T13:59:05.276Z xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx NSX 43044 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher_lin Adding container xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx in namespace /var/vcap/data/garden-cni/container-netns/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (IP: 10.x.x.121/27, MAC: 04:50:56:00:f8:04, gateway: 10.x.x.1, VLAN: 191, dev: eth0)

2024-10-04T13:59:05.389Z xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx NSX 44637 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="ERROR" errorCode="NCP01005"] nsx_ujo.agent.interfaces Failed to setup veth pair for container xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: (101, 'Network is unreachable')

2024-10-04T13:59:05.397Z xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx NSX 43044 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="WARNING"] nsx_ujo.agent.cni_watcher_lin Failed to add cni for container xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx because of unexpected exception: Unexpected error from nsx_node_agent: Failed to setup veth pair for container xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: (101, 'Network is unreachable').

2024-10-04T13:59:05.555Z xxxxxxxxxxxxxxxxxxxxxxx NSX 43044 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="INFO"] nsx_ujo.agent.cni_watcher Received CNI request message: {"version": "2.0.0", "config": {"container_key": "xxxxxxxxxxxxxxxxxxxxxxx", "container_id": "xxxxxxxxxxxxxxxxxxxxxxx", "netns_path": "/var/vcap/data/garden-cni/container-netns/xxxxxxxxxxxxxxxxxxxxxxx", "dev": "eth0", "runtime_config": {}}, "op": "DEL"}

 

Cause

One of the causes of this issue that was identified was the presence of an extra network subnet present in the Logical switches (seen in NSX Manager UI) associated with the TAS org name that you are having issues pushing applications to. This extra subnet interferes with container creation in TAS because the container IP gateway is unreachable along with the possibility of IPs overlapping via that subnet rule. There should only be one subnet rule per Logical switch for a given TAS org.

We can check the presence of this extra subnet by doing the following: 

1. Log into NSX Manager UI per this documentation

2. In the Logical switches section, search for the TAS Org name, and expand the subnet rules. In the example below, we have 6 logical switches that are associated with a single TAS org. As we can see below, we have 2 subnet rules which are 10.x.x.136/29 and 10.x.x.32/27. We should only have a single subnet which should be a /27 subnet range. 

3. Take note of all Logical switches that have an extra subnet rule. In this case, we see that there are 6 logical switches associated with our TAS org. We need to check every one of those 6 logical switches linked to our TAS org to see if we have an extra subnet. 

To resolve this issue, we will need to delete the extra /29 subnet (or other subnets) to prevent it from interfering with pushing applications via the CF CLI. 

Resolution

1. To resolve this issue, we can delete these extra subnets in the NSX Manager. We can do this by going to the "IP Address Pools" tab > Locate the affected Logical Switch (TAS Org) > Edit the subnet rule, and press the "Delete" button.

However, if you are unable to delete the offending extra subnet, you may face an error similar to whats seen in the screenshot below: 

If you are unable to delete the extra subnet, it may be due to the fact that  there are IP addresses that are being used in the subnet that we are trying to delete.

We can browse back to the IP Pools section, and click on the IP allocation to see how many IP addresses are being used. As we can see below, 2 of our Logical Switches in NSX manager associated with the problematic TAS org has a total of 5 IPs that are in use (2 IPs in the first Logical Switch, and 3 IPs in the second Logical Switch). 

2. We can release these IP addresses (In our case 10.x.x.138, 10.x.x.137, 10.x.x.134, 10.x.x.132, and 10.x.x.129) by following method 2 listed in the following KB article: https://knowledge.broadcom.com/external/article/322584/tep-ip-addresses-not-released-after-forc.html 

After releasing the allocated IP addresses, we can now try to remove the offending extra subnets once more as we were trying to do in step 1. 

 

3. Next, we can attempt to push applications in the affected TAS org to see if the removal of the extra subnet has resolved this issue.