NSX Edge failover and traffic interruption during manual datapathd core dump generation
search cancel

NSX Edge failover and traffic interruption during manual datapathd core dump generation

book

Article ID: 428415

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • When attempting to generate a manual datapathd core dump on an NSX Edge node, you may observe a temporary data plane outage or a brief "blip" in network connectivity and it is observed that Inter-SR-BGP sessions going down.
  • Brief loss of routing for the affected Edge node.
  • Traffic interruption lasting from several seconds to a few minutes.

Environment

VMware NSX

Cause

  • The gcore utility freezes the execution of the datapathd process by sending a SIGSTOP signal while it writes memory contents to the disk.  
  • Since datapathd handles all packet forwarding, the Edge cannot process traffic during this time. This freeze triggers a BFD timeout, leading to a failover of edge node.

Resolution

This is expected behavior when capturing a core dump of the datapathd process.
To manage this process safely:

  1. Plan for Failover: Recognize that running gcore on an active Edge node will trigger a failover to the standby node.

  2. Maintenance Window: Perform manual core dumps during scheduled maintenance windows to account for the brief session re-establishment period.

  3. Monitor Progress: The duration of the outage depends on the Edge size and disk speed; the process will remain frozen until the memory write is complete.

Additional Information

Generate datapathd core file on an NSX Edge