Bosh Director not be assigned IP during Apply Change when Tanzu Platform running on vSphere and NSX-T
search cancel

Bosh Director not be assigned IP during Apply Change when Tanzu Platform running on vSphere and NSX-T

book

Article ID: 428305

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

Customer's Tanzu platform is running with vSphere and NSX-T. Apply Change on the bosh director, failed with below error message:

Deploying:
  Creating instance 'bosh/0':
    Waiting until instance is ready:
      Post "https://vcap:<redacted>@##.##.##.##:6868/agent": dial tcp ##.##.##.##:6868: i/o timeout
Exit code 1

Search the bosh director VM in the vSphere Console, we found that IP address not get assigned.

Environment

Tanzu Platform running on vSphere with NSX 9.0.0

Cause

To trouble-shooting this kind of issue:

1. Search bosh director IP address in the vSphere Console to verify if IP address has been occupied by other VMs. If there is no other VMs occupied this IP address, it's good. 

2. Try to change the Network adapter of the issued bosh director VM to non-NSX port group or distributed switch in order to narrow down the issue.

3. If bosh director VM can be assigned IP after changing the Network adapter, it indicates the issue is on networking. Suggest customer to further check on the network side.

 

In this customer's case, Bosh Director does not get an IP address post reboot, and interface remains down only when using NSX segment. The issue was identified to be due to JDK bug causing the ESXi host to control plane link to go down, this caused port block and lead to the interface be down. 

Resolution

This issue could related to NSX bug - NSX is Impacted by JDK-8330017: ForkJoinPool Stops Executing Tasks Due to ctl Field Release Count (RC) Overflow.

For environments running affected versions (VMware NSX 4.2.0.x, VMware NSX 4.2.1.0, 4.2.1.1, 4.2.1.2, 4.2.1.3, VMware NSX 9.0.0.0), implement a preventative monthly rolling reboot schedule:

  • Reboot the first NSX Manager.
  • SSH to a Manager as admin user and check cluster health: get cluster status
  • When all services report up on all 3 NSX Manager nodes, reboot the next Manager.
  • Repeat steps 2-3 for the third Manager.

This issue is resolved in VMware NSX 4.2.1.4, 4.2.2.0, 9.0.1.0 and above, available at Broadcom downloads.