When the customer activates an edge, the customer notices that their CSS tunnels flap on a different edge causing disruption the environment, this document discusses the cause and the resolution.
When the customer activates an edge in the profile, we can see that once activation occurs, a configuration push for the NVS is triggered then we can see the NVS going down.
2024-10-02T09:07:39.105 MSG [NVS] parse_css_config:3863 [S] NVS Configuration (Test) validation code=0
2024-10-02T09:07:39.105 MSG [NVS] update_nvs:735 [S] NVS=Test config changed for segment 0
2024-10-02T09:07:39.849 MSG [NVS] update_nvs_path_mutable_props:1021 [S] Updating NVS=Test path 8591215B7A3BEBAD for 852a6514-d594-445a-bb61-1494135142ff on link 00000005-33eb-471d-a612-6b2ba3380291, segment 0, destination:Primary:160.0.0.1 l7:0
2024-10-02T09:07:39.849 MSG [NVS] release_ike_id:874 [S] Freeing IKE_ID
2024-10-02T09:07:39.849 MSG [NVS] update_nvs_path_mutable_props:1021 [S] Updating NVS=Test D291C0160 for 852a6514-d594-445a-bb61-1494135142ff on link 00000005-33eb-471d-a612-6b2ba3380291, segment 0, destination:Backup:160.0.0.2 l7:0
2024-10-02T09:07:59.878 ERR [VPN] gre_keepalive_time_check:2151 [S] GRE tunnel 1 down - no keepalives received for 20028 ms
2024-10-02T09:07:59.878 ERR [VPN] gre_keepalive_time_check:2151 [S] GRE tunnel 2 down - no keepalives received for 20028 ms
2024-10-02T09:08:03.880 INFO [server (6334:MgdServer:6954)] Received event ALL_CSS_DOWN from edged
2024-10-02T09:08:03.881 INFO [server (6334:MgdServer:6954)] Received health stats from edged
-After a few minutes the tunnels come back up on their own, we can observe this in the events of the VCO.
The CSS tunnels only flap when activating another edge containing the same CSS profile.
Issue is related to a bug under reference 130495.
https://docs.vmware.com/en/VMware-SASE/5.2.3/rn/vmware-sase-523-release-notes/index.html
Fixed Issue 130495: For a customer enterprise using a Cloud Security Service (CSS) with GRE tunnels, if the customer activates a new Edge that is associated with a configuration profile shared by other Edges also using this CSS, the client users at those other locations may observe that traffic using the CSS drops.
Upon receiving a control plane update for either CSS or Non SD-WAN Destination (NSD) via Edge, GRE tunnels may fluctuate. This is because the tunnel configuration is assumed to change, causing it to be torn down and recreated. The fix ensures that if no changes are detected, the GRE tunnel remains operational.
On an Edge without a fix for this issue, the workaround is to activate the Edge to an isolated configuration profile and, once the Edge is up, only then transfer it to its proper profile.
Fix is to upgrade to version 5.2.3.0, 5.2.3.3, 5.2.4.0 and 6.0.0.0