This is fixed in vSphere 7.0 Update 3q as per release notes
https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u3q-release-notes/index.html
"PR 3282224: VXLAN traffic generated from a guest VM to port 4789 fails"
In the event an environment needs a workaround prior to an ESX upgrade can take place, then temporarily disabling the checksum offloading will restore communication. The command that will do this on the BOSH deployed VMs:
/usr/sbin/ethtool -K eth0 tx-checksum-ip-generic off
Workaround 1
This method will restore the communication, however it will not persist if the VM is recreated for any reason. This method involves using the BOSH cli to ssh into the instances and running a command:
bosh -d <CF-GUID> ssh diego_cell -c "sudo /usr/sbin/ethtool -K eth0 tx-checksum-ip-generic off”
Please substitute the CF-GUID for your cf deployment name, along with any necessary changes for bosh cli usage. Also note that if this is occurring in isolation segments, to do the same there.
bosh -d <p-isolation-segment-GUID> ssh isolated_diego_cell -c "sudo /usr/sbin/ethtool -K eth0 tx-checksum-ip-generic off”
Please substitute the naming to match the environment's naming convention for the isolation segment.
Workaround 2
This method will restore the communication, and it will persist if the VM is recreated or restarted for any reason. This method involves leveraging os-conf via a BOSH runtime-config.
Create a file called os-conf-c2c.yml with the following content:
releases: - name: os-conf version: 22.2.1 addons: - name: os-configuration include: jobs: - name: rep release: diego deployments: - cf-GUID jobs: - name: pre-start-script release: os-conf properties: script: |- #!/bin/bash /usr/sbin/ethtool -K eth0 tx-checksum-ip-generic off echo "ACTION==\"add|change\", SUBSYSTEM==\"net\", KERNEL==\"eth*|en*\", RUN+=\"/usr/sbin/ethtool -K \$name tx-checksum-ip-generic off\"" > /etc/udev/rules.d/61-net.tx-checksum-ip-generic.rules
Please substitute any values specific to the environment this is being applied to. For example, the os-conf release version and the cf-GUID part. Additionally if there are isolation segments that need this, be sure to modify the placement rules however needed. For example, we may remove the deployment block if we want it to be applied to all VMs with the release diego.
Once the file is saved, it can be uploaded to BOSH:
bosh update-config --type=runtime --name=os-conf-c2c os-conf-c2c.yml
Then an Apply Changes to all applicable tiles (tiles that include diego_cells).