OpenShift runs an overlay using NSX-T as the underlay. Traffic from the overlay that must reach the internet is source natted to an egress IP associated with a VM connected to an NSX-T segment.
The egress IP can move between VMs as needed, but this movement is not visible in NSX or OpenShift. Traffic is seen leaving the correct VM/host, being allowed by DFW, but returning to another host and being dropped as there is no rule to allow traffic inbound.
Checking the ARP on both hosts shows the egress IP present in both hosts ARP response for the segment. The correct ARP entry on one host(source of traffic) can be seen, then an entry on another host tying the IP to a VM that does not hold the egress IP is visible(the traffic is dropped).
No visible issues in OpenShift . Disabling trust on first use (TOFU) for those segments and removing any bindings that were incorrect do not help, the duplicate ARP entries are still present and drops return traffic being sent to the incorrect host/VM.
NSX 4.2.0.1
OpenShift has a bug on Egress IP traffic reported.
Contact the vendor (RedHat) for further assistance.