TCP resets sent by IP Conntrack for containerized Application on High latency networks

Products

Operations Manager

Issue/Introduction

Some applications may observe timeouts or connection reset errors when sending and receiving data from within a container on high latency networks. We can explain in more detail with the following example, however there are likely to be more scenarios with similar but different symptoms. We will highlight the import symptoms which can help enable you to determine if this KB is a match to the issue your team is experiencing.

Example Scenario
In this example, we have an application that is sending post request data to an external resource from a Linux Diego cell running on ubuntu Xenial Stemcell version 621.x. In this case, the application will send a post request to the external resource with a payload of 1MB and the request flow will look like this:

Application Container -> Diego Cell -> Firewall -> NAT -> External Resource

As per the above call flow, the application is sending a POST request to some external resource and has to traverse through two network devices before reaching its destination. Here are the symptoms we observe when running a tcpdump:

Column names for reference

Frame, Timestamp, SRC-IP, DST-IP, Length, Sequence, Acknowledgment

Frame 68 sends 2554 bytes of data to the external resource. The Diego cell TCP protocol will expect to receive an Acknowledgment packet with sequence number 102222:

68    2020-12-15 22:41:40.266698  DiegoCell-IP    External-Resource-IP   2554    99734    4501

About 100ms later, the Acknowledgment is received in frame 114. During that 100ms delay in receiving the ack, a lot more data has been sent to the external resource and acknowledged. Other acknowledgments have been received with a sequence number higher than 102222:

114	2020-12-15 22:41:40.378925		External-Resource-IP	DiegoCell-IP	66	4501	102222

The Diego Cell returns a TCP RESET-REPLY to this packet and continues to send data to the external resource. This means the Diego cell sends this reset without an acknowledgment flag set and does not abort the TCP connection. For more information regarding the difference between a RESET-REPLY and a RESET-ABORT, see this blog post. Keep in mind this reset is not coming from the application container network interface, it is coming from the Diego cell network interface only.

115	2020-12-15 22:41:40.378950	DiegoCell-IP	External-Resource-IP 54	102222	0

The reason why the Diego cell has reset this acknowledgment is because of the way IP conntrack handles the inbound TCP packets. There is a security feature within IP conntrack that inspects the packet to ensure the arrive frame is valid and can be forwarded along the normal chain. In this case, because the ack arrived late and there was lots more data transmitted during this time, IP conntrack determined that the packet is invalid and issues a reset.

The firewall receives the reset and passes it up through the NAT which resets the tcp session between the NAT and the external resource. In this case, the firewall keeps the connection open between the Diego cell which makes the application think the external resources is receiving all the data it has sent. But the external resource will never reply and eventually the application will timeout the request and fail.

Troubleshooting tip

One simple test you can perform to see if you are experiencing this issue is to construct a matching request, using curl, when the application is performing and execute the request when ssh'ed into the application container. As well as a separate test when ssh'ed into the Diego cell that the application container is running on. If the test from the Diego cell works, but the test from the application container fails, then you are likely experiencing this issue.

Environment

Product Version: 2.10
OS: Ubuntu Xenial

Resolution

The cause

There have been many reports of this issue in various containerized workloads. The following kubernetes blog describes a different scenario where this problem is observed but offers the same solutions that we will recommend to our customers:
https://kubernetes.io/blog/2019/03/29/kube-proxy-subtleties-debugging-an-intermittent-connection-reset/

The original Bug reference for this issue is here:
https://github.com/moby/libnetwork/issues/1090

The workaround

Use the os-conf bosh add-on to set the following sysctl variable from 0 to 1. This parameter needs to be set on Diego cells only in order to prevent these TCP resets from happening. This setting instructs conntrack to forward the TCP frame instead of marking it invalid:

/proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal

This can also be done live using the following command during a live troubleshooting session, this method will not survive a reboot or an apply changes so it is recommended to use the os-conf bosh add-on to make the change permanent:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal