Long-lived traffic from GVMs are not redirected to Partner service VMs
book
Article ID: 376376
calendar_today
Updated On:
Products
VMware vDefend Firewall
Issue/Introduction
Long-lived traffic from GVMs is not redirected to Partner service VMs.
Symptoms:
E-W Service insertion enabled.
Stateful SI firewall policies are configured.
SI failover policy is "allow"
The environment uses applications or protocols that maintains long-running sessions (Example: client and server that maintain a long-lived connection, protocols like SCTP etc.)
The traffic subjected to stateful firewall not redirected to Partner SVMS.
Dfwpkt.log don’t show rule hit logs.
There will be an existing flow for the respective source and destination IP in the flow table "vsipioctl getflows -f <slot 12 filter>"
Intermittent issues on the partner SVMs such as SVM not responding to liveliness packets, SVM interface up/down, SVM reboot etc. that is causing "service Endpoint Down" alerts in the Vmkernal.log
Example logs:
2024-08-21T10:56:48.439Z cpu18:2110632)NetX Proxy: Service Endpoint with MAC: 00:xx:xx:xx:xx:xx is down
2024-08-21T10:56:48.439Z cpu18:2110632)vif id for switch port xxxxx is xxxxxxxx-xxxx-xxx-xxxxxxxxxx
2024-08-21T10:56:48.439Z cpu18:2110632)NetX Proxy: Sent message to LCP for to_Failure
Environment
VMware NSX-T Data Center 3.x VMware NSX-T Data Center 4.x
Cause
The issue is seen only with long-lived sessions, whenever the failure policy is configured as 'Allow' and the stateful SI policy is set. If the ESXI host marks the SVM as down for any unknown reason, then the failover policy kicks in and passthru the existing traffic. This is an expected behavior.
When the SVM recovers, all the new sessions/connections will hit the stateful firewall policies/Rules, and will be redirected to SVM as normal but the long-running sessions where the flow table still holds old entry for the traffic the connection remains passthru. So the trafic will not hit the SVM until the current entry gets cleared in the flow table.
Resolution
Workaround:
1st:
Resolve the issues that causes the SVM "Service endpoint Down" alert on the ESXI host(work with partner service provider to recover from the issue).
Once the SVM recovered, flush the old connection entries in the flow table by adding and removing the impacted GVM into SI Exclude list.
Another way to flush the flow table is by restarting the GVM.
2nd:
Configure stateless rules for the traffic which is not redirected to service VM (probably these are the long-running sessions.)
flush the old connection entries in the flow table by adding and removing the impacted GVM into SI Exclude list or reboot the GVM.