There are challenges when collecting TCPDUMP for a distributed application. This article explains how to narrow down the scope of the TCP collection and determine if a TCPDUMP will be useful for troubleshooting a distributed application
Some of Pivotal's distributed applications include:
Depending on the workload, the tcpdump data size can grow extremely fast. For example, when troubleshooting an issue, the OS root partition can fill up and cause a very serious issue. To avoid this, run a rolling TCPDUMP with parameters that control how much data is collected.
Along with filling up the data partition, another common mistake is performing a wide open trace with no filters. If a wide open trace is run on a 10 GB interface during production workload, the result will be GBs of unnecessary data.
Apply a TCPDUMP Filter and limit to capture the length. Let's say, for example, it is needed to collect a tcpdump to debug a HDFS permission issue when a mapreduce job runs. It is not possible to submit a TCPDUMP on all node managers in the cluster with no filter and collect a huge amount of useless data. Apply the filter "tcp port 8020 and host <ip of namenode>" to filter only on HDFS metadata traffic from the node.
There can be Terabytes of data being transferred in some of these distributed applications. Running a TCP dump on a highly utilized node may cause the Kernel to discard some of the frames before sending them up the stack where TCPDUMP is listening. In this case, the packet would reach the application but TCPDUMP will not be able to collect the information.
In this situation, look at "When to take a tcpdump". The chances are a dump will not be helpful in this case. If the node is this much utilized then it will most likely be the root of the problem and TCPDUMP is a not required.
In most cases, the application should have enough logging to help the user understand if there was a communication issue on the remote node. For example, if the application is GPFDIST and the read timeout 300 seconds and the segment log says we timed out communicating with GPFDIST, then a tcpdump might not help. The tcpdump will simply tell you that GPFDIST didn't respond so segment timed out.
Whenever possible avoid executing a TCPDUMP. The best time to run a TCP collection is to rule out network latency, customer firewalls, and to debug what data is being sent to the remote node.
First, prepare the node for collection. In order to run the dump you have to be root. The kernel will not allow non-super users to put a network interface into promiscuous mode. However, when tcpdump runs it will by default write out files as the tcpdump user. Make sure the tcpdump user has to write permissions to the collection directory.
sudo -u tcpdump touch /data/support/tcpdump/test123
Once the above pre-reqs are completed, start the trace:
tcpdump -C 512 -W 8 -s 256 -w /data/support/tcpdump/gpdbgang_issue -i bond0 "tcp port 8020"
The above command will collect 8 files with size of 512MB resulting in 4GB of data collected. TCPDUMP will first write to file1 through file8.Once file8 has reached 512MB it will loop back and overwrite file1 resulting in a rolling tcpdump.
the -s option is the packet capture length. In a typical infrastructure the max packet length is 1514 bytes. If not interested in capturing the data payload of packet then it is a good idea to limit this to about 256 bytes to capture the packet header and some data.
Arg | Value | Description |
-C | 512 | Max file Size |
-W | 8 | Total Number of Files to Collect |
-s | 256 | Capture Lenght for each packet |
-w | /data/support/tcpdump/gpdbgang_issue | path to save collection. in this case gpdbgang_issue will be used as prefix for the 8 files collected |
-i | bond0 | Only listen on Interface bond0 |
Filter | "tcp port 8020" | Filter on TCP port 8020 |