Apply changes fails with an error similar to:
Task xxxx | 09:59:48 | Error: Action Failed get_task: Task xxx-xxx-xxx result: 1 of 15 pre-start scripts failed. Failed Jobs: vxlan-policy-agent. Successful Jobs: loggregator_agent, silk-cni, cfdot, bpm, garden-cni, smbdriver, nfsv3driver, bosh-dns, syslog_forwarder, garden, mapfs, silk-daemon, cflinuxfs3-rootfs-setup, cflinuxfs4-rootfs-setup.
Looking in the isolation_diego_cell logs for vxlan-policy-agent we can see the pre-start.stderr.log failing with:
pre-start error: lock: open lock file: open /var/vcap/data/garden-cni/iptables.lock: no such file or directory
VMware Tanzu Platform for Cloud Foundry 4.x
Tanzu Isolation Segment 4.x
This issue is due to a race condition happening while silk-release and garden-cni jobs are starting.
Silk-release:
Garden-cni job
If the directory is deleted by garden-cni job between steps 1 and 2 of silk-release pre-start job then the condition happens and the error above occurs.
This issue has been fixed on components cf-networking and silk-release on version 3.47.0 which is included on TAS, IST & TASW tiles versions:
4.0.26
5.0.16
6.0.6
As a workaround manually recreate the diego cell with the error by running bosh command:
bosh -d deployment-guid recreate iso_cell_guid --no-converge --fix