Generate datapathd core file on an NSX Edge
search cancel

Generate datapathd core file on an NSX Edge

book

Article ID: 376675

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • A core file from NSX-T may be required for effective troubleshooting.
  • This KB article outlines the procedure for generating core files on an Edge node.

Environment

VMware NSX
VMware NSX-T Data Center 

Cause

  • Scaled up environments (logical-routers) may experience issues with dataplane traffic.  
  • Retrieving and analyzing a core file can assist in troubleshooting.

Resolution

Warning:

  • While gcore does not terminate the process, it freezes the execution of the process (sending a SIGSTOP) while it writes the memory contents to disk.
  • Because datapathd is responsible for all packet forwarding, traffic through this Edge node is interrupted for the duration of the dump.
  • The outage lasts as long as it takes to write the memory to the disk. Depending on the Edge size (Small, Medium, Large, Bare Metal) and disk speed, this can take anywhere from several seconds to a few minutes.
  • If the Edge node is part of an Active-Standby cluster, the freeze will likely cause a BFD timeout, triggering a failover to the standby node. This minimizes the actual downtime but still results in a brief "blip" as sessions re-establish.
  • Please follow these steps only if directed by Broadcom Support Team as part of troubleshooting.

Steps to generate core dump:

  • Check /var/dump have enough available space by below command and clean up if necessary.

    #  df -h /var/dump

Caution: Certain core dump file can be very large (potentially >18 GB).
Make sure the /var/dump directory has available space with df -h /var/dump and clean up old files if necessary.

  • Identify the PID of the needed process to core dump, Eg, take datapathd as example, process by running the below command:

    # ps -ef | grep -i datapathd

        Example:

    UID         PID     PPID  C  STIME  TTY          TIME        CMD
    root        4###    ####  15 <date>  ?        3-<time>  /opt/vmware/nsx-edge/sbin/datapathd --no-chdir --unixctl=/var/run/vmware/edge/dpd.ctl --pidfile=/var/run/vmware/edge/dpd.pid -vconsole:err -vsyslog:info --syslog-method=udp:127.0.0.1 --cfgfile=/config/vmware/edge/config.json

  •  Run the following gcore command to generate core dump file.

         /opt/vmware/nsx-edge/sbin/gcore -o /var/dump/[core dump file prefix] <PID

    Example: 
    root@<edge-name>:~# /opt/vmware/nsx-edge/sbin/gcore -o /var/dump/datapathd 4###
    [New LWP 5001]
    [New LWP 5002]
    [New LWP 5321]
    [New LWP 5322]
    [New LWP 5341]
    [New LWP 5345]
    [New LWP 5346]
    [New LWP 5407]
    ....

    [New LWP 5720]
    [New LWP 5721]
    [New LWP 5722]
    [New LWP 5725]
    [New LWP 5726]
    warning: Cannot call inferior functions, Linux kernel PaX protection forbids return to non-executable pages!
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    0x00006b74c866f46e in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
    warning: target file /proc/4981/cmdline contained unexpected null characters
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c8758000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c8afa000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c8d64000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c9215000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c96f6000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c9a77000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74c9cb8000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74ca35a000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cacd8000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cbae9000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cbdc9000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cbfe0000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74ce344000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74ce56a000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74ce784000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74ce997000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74ceb9e000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cee20000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cf039000.
    warning: Memory read failed for corefile section, 1048576 bytes at 0x6b74cf2df000.
    Saved corefile /var/dump/datapathd.4981
    [Inferior 1 (process 4981) detached]
  • Core file will be generated to /var/dump folder (as specified in -o option in command) and save as datapathd.<pid> and also automatically compressed to a zipped copy.

    Example:
    root@<edge-name>:~# ls -l /var/dump
    total #######
    -rw-r--r-- 1 root root   ######## Jan 26 19:48 core.gdb.########.###29.0.9.gz
    -rw-r--r-- 1 root root ########## Jan 26 19:48 datapathd.4###
  • When generating log bundles, ensure that the option “Include files that may contain sensitive information” is selected.

         

         Note: Consider deleting uncompressed core dumps before collecting the Edge support bundle
                   -- otherwise expect extremely long collection times + extremely long upload times and frequent upload failures to the Broadcom support portal.

 

Additional Information

See the below links for Creating a Case with Broadcom and for uploading the logs to that Case: