Several services(dataplane, local-controller, nds, dispatcher, dhcp, router) stop at the same time and triggering failover due to the dataplane service stopping.
There is a kernel defect and it causes "bad frame in rt_sigreturn" logs.
Here is an example of syslog from the affected edge node.
<TIMESTAMP> <EDGE_HOSTNAME> containerd 2745 - - fatal error: unexpected signal during runtime execution <TIMESTAMP> <EDGE_HOSTNAME> kernel - - - [10743.190381] containerd[4193] bad frame in rt_sigreturn frame:000075c000b89378 ip:141bb02305b0 sp:755e1f788d28 orax:ffffffffffffffff in containerd[141bb020f000+12ae000] |
VMware NSX-T Data Center 3.2.3.0.1
It happens because containerd process crashes due to segmentation violation which causes several services to stop.
This issue is addressed in NSX-T 3.2.4, NSX-T 4.1.1 and above.