In Tanzu Greenplum, we may find ourselves troubleshooting network packet loss errors. In this article, we will describe some of the standard tools available on most systems for troubleshooting network-related errors and issues. We also explain which layer these tools are reporting on so that the user can get a sense of where the fault might be and how to proceed from there. Understanding what the available network troubleshooting tools are and when to use them can be a big help with misbehaving distributed applications.
The basic high-level life cycle of a network datagram as it travels through the NIC (Network Interface Card) to the application. This is obviously skipping a lot of information and a lot of abstract concepts. So do not expect to get a deep dive into how Linux handles the network. Instead, this article should provide guidance on how to troubleshoot the network.
The simplified image above shows us that there are a lot of layers that a network packet has to go through before it reaches the applications. It is important to understand this because sometimes when seeing packet loss on the client, it can actually be directly correlated to an application issue. Distributed applications rely heavily on the network to share and process data. If the application cannot keep up with the incoming data, then we may start to see network packets dropped by the kernel. Understanding this basic flow will help you work the problem.
From the typical distributed application user's perspective, it may not always be obvious if the problem that they are facing is related to the external network. It is important to rule out what you can before assuming immediately that it is some mystical networking issue. Before you call in your network expert, we will explore some things that can be checked first.
Inspecting at the NIC/Firmware/Driver level
ethtool can read information from the NIC driver given an associated interface name.
This command shows NIC Driver State. Immediately from this output, we can see if the interface link is up or down "Link detected: yes". In addition, we can ensure the Speed is 10Gbit and Duplex is Full:
ethtool <interface name> root@node4 ~]# ethtool eth0 Settings for eth0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Link detected: yes
Checking the NIC Driver settings. In general, distributed applications will send and receive large amounts of data. So having things like generic receive offload (GRO) or large receive offload (LRO) enabled may result in a network bottleneck at the driver level. In the case of GPDB, we transmit UDP datagrams in8192 byte chunks which results in fragmentation that will put your network interface card to work. If you have these settings enabled and are experiencing network latency related symptoms, then a good test would be to disable GRO and LRO:
ethtool -k <interface name> [root@mdw ~]# ethtool -k eth0 Offload parameters for eth0: Cannot get device udp large send offload settings: Operation not supported rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: off generic-receive-offload: off large-receive-offload: on
Capital S switch will dump all the firmware/driver counters available. Typically there is a lot of information dumped and the outputs will vary depending on software revisions. If you find that there are CRC errors, then you can assume there is a hardware-related problem with the NIC cable or the NIC card itself. The recommended actions here would be to first try and replace the network cable that plugs into the eth port and ask the network team to rule out the hardware on the switch. If all that fails, then replace the NIC on the given server.
ethtool -S <interface name> [root@mdw ~]# ethtool -S eth0 | egrep "crc|error" rx_error_bytes: 0 tx_error_bytes: 0 tx_mac_errors: 0 tx_carrier_errors: 0 rx_crc_errors: 123 rx_align_errors: 0
ifconfig and "netstat -i" will both pretty much give you the same kind of information which are the tx/rx error counters you will see in ethtool -S. Counters found in RX-DRP or RX-OVR are typical indicators that there are packet errors at the driver level. If you see Drops or Overruns then it could mean that the kernel is not pulling packets fast enough or the driver is not able to keep up with the workload. In the case of the driver, it would be a good time to reach out to the vendor for support and see if there are any firmware/driver updates or settings that can be implemented to improve performance here.
[root@gpdb-sandbox ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:F7:B1:14 inet addr:172.16.34.128 Bcast:172.16.34.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fef7:b114/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:84887 errors:0 dropped:0 overruns:0 frame:0 TX packets:28074 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:43123224 (41.1 MiB) TX bytes:5168804 (4.9 MiB)
[root@mdw ~]# netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 618457118 0 0 0 644714020 0 0 0 BMRU eth1 1500 0 106690824 0 0 0 13942215 0 0 0 BMRU eth4 1500 0 579489994 0 0 0 4526273 0 0 0 BMRU eth5 1500 0 5145 0 0 0 11 0 0 0 BMRU lo 16436 0 1040195 0 0 0 1040195 0 0 0 LRU vmnet1 1500 0 0 0 0 0 6 0 0 0 BMRU vmnet8 1500 0 1896834 0 0 0 6 0 0 0 BMRU
tpcdump operates at the kernel level and can be useful for inspecting TCP based applications. For information on how and when to use this tool, refer to Running TCPDUMP to debug distributed applications.
"netstat -s" will dump out all of the kernel counters and provides detailed information related to the UDP/TCP kernel stack counters.
netstat -st |egrep -i collapsed [root@mdw ~]# netstat -st | egrep -i collapsed 3190 packets collapsed in receive queue due to low socket buffer
netstat -st |egrep -iretrans [root@mdw ~]# netstat -st | egrep -i retrans 81145 segments retransmited 51597 fast retransmits 25709 forward retransmits 1750 retransmits in slow start 62 sack retransmits failed
netstat -su |egrep error [root@mdw ~]# netstat -su | egrep error 0 packet receive errors
Checklist:
Inspecting at the NIC/Firmware/Driver level
ethtool can read information from the NIC driver given an associated interface name.
ethtool <interface name> root@node4 ~]# ethtool eth0 Settings for eth0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Link detected: yes
ethtool -k <interface name> [root@mdw ~]# ethtool -k eth0 Offload parameters for eth0: Cannot get device udp large send offload settings: Operation not supported rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: off generic-receive-offload: off large-receive-offload: on
ethtool -S <interface name> [root@mdw ~]# ethtool -S eth0 | egrep "crc|error" rx_error_bytes: 0 tx_error_bytes: 0 tx_mac_errors: 0 tx_carrier_errors: 0 rx_crc_errors: 123 rx_align_errors: 0
ifconfig and "netstat -i" will both pretty much give you the same kind of information which are the tx/rx error counters you will see in ethtool -S. Counters found in RX-DRP or RX-OVR are typical indicators that there are packet errors at the driver level. If you see Drops or Overruns then it could mean that the kernel is not pulling packets fast enough or the driver is not able to keep up with the workload. In the case of the driver, it would be a good time to reach out to the vendor for support and see if there are any firmware/driver updates or settings that can be implemented to improve performance here.
[root@gpdb-sandbox ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:F7:B1:14 inet addr:172.16.34.128 Bcast:172.16.34.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fef7:b114/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:84887 errors:0 dropped:0 overruns:0 frame:0 TX packets:28074 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:43123224 (41.1 MiB) TX bytes:5168804 (4.9 MiB)
[root@mdw ~]# netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 618457118 0 0 0 644714020 0 0 0 BMRU eth1 1500 0 106690824 0 0 0 13942215 0 0 0 BMRU eth4 1500 0 579489994 0 0 0 4526273 0 0 0 BMRU eth5 1500 0 5145 0 0 0 11 0 0 0 BMRU lo 16436 0 1040195 0 0 0 1040195 0 0 0 LRU vmnet1 1500 0 0 0 0 0 6 0 0 0 BMRU vmnet8 1500 0 1896834 0 0 0 6 0 0 0 BMRU
tpcdump operates at the kernel level and can be useful for inspecting TCP based applications. For information on how and when to use this tool, refer to Running TCPDUMP to debug distributed applications.
"netstat -s" will dump out all of the kernel counters and provides detailed information related to the UDP/TCP kernel stack counters.
netstat -st |egrep -i collapsed [root@mdw ~]# netstat -st | egrep -i collapsed 3190 packets collapsed in receive queue due to low socket buffer
netstat -st |egrep -iretrans [root@mdw ~]# netstat -st | egrep -i retrans 81145 segments retransmited 51597 fast retransmits 25709 forward retransmits 1750 retransmits in slow start 62 sack retransmits failed
netstat -su |egrep error [root@mdw ~]# netstat -su | egrep error 0 packet receive errors