Following a power cycle or reboot of the VMware SD-WAN Edge (VCE), the L7 Health Check of the Cloud Security Service (CSS) stops sending probes to the end point service, such as a ZScaler.
This causes the Cloud Security Service (CSS) tunnels to stay operationally down and traffic gets dropped.
This issue is caused by defect id 74149
When edge reboots the edged initializes l7_health_check process.
Once the l7_health_check process starts, it waits for 10 seconds before reading the list of Zscaler tunnels from edged. If edged takes more than 10 seconds to start link fsm thread, the Zscaler tunnel creation will be delayed and l7_health_check process will have miss to get proper WAN link details.
This happens if the WAN link stays down while edge is rebooting.
This defect 74149 is resolved in R432-20221115-GA, R5002-20220506-GA and later. For information on how to upgrade please check the following article: Search cancel Search VMware SD-WAN Software Upgrade FAQs
Workaround:
Toggle L7 Health Check (turn off, save changes, and then turn back on and save changes).
Steps:
Configure -> Network Services -> Cloud Security Service
Select the correct CSS Name
Uncheck the L7 Health Check box and Click Add
Wait for a few seconds and Check the L7 Health Check box and Click Add.
For more information please check the administration Guide.
Configure a Cloud Security Service
The mentioned workaround is to be applied as a recovery action once the issue has occurred. The workaround will disable the L7 healthcheck feature momentarily but this has no additional impact.