TKGm Envoy returns 413 Payload Too Large

Products

Tanzu Kubernetes Runtime Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

Contour/Envoy can be installed as a Tanzu package in TKGm following the Docs Install Contour in Workload Clusters Deployed by a Standalone Management Cluster

In certain scenarios, Envoy pods may return "413 Payload Too Large" responses.

This KB outlines the steps to increase the per_connection_buffer_limit_bytes in the Contour package configuration to remediate it.

Environment

Contour 1.26 and above.

Older Contour versions don't have per_connection_buffer_limit_bytes variable available.

Cause

As described in Why is Envoy sending 413s?, the error may be seen when Envoy buffer limits are being reached.

Resolution

Pre-check

First, verify that the 413 errors are actually coming from Envoy and not from other upstream load balancer or service.

As described in How do I configure flow control?, Envoy will increase the metric downstream_rq_too_large every time it returns a 413 error.
To check Envoy metrics, we can follow the Accessing the Envoy Administration Interface Documentation and check the /stats/prometheus endpoint for the above metric.

For example:

# kubectl -n tanzu-system-ingress port-forward <envoy-pod-name> 9001
# curl -kv http://127.0.0.1:9001/stats/prometheus | grep downstream_rq_too_large

If we see counts equal to 0, most likely the 413 errors are not coming from Envoy.

envoy_http_downstream_rq_too_large{envoy_http_conn_manager_prefix="admin"} 0
envoy_http_downstream_rq_too_large{envoy_http_conn_manager_prefix="envoy-admin"} 0
envoy_http_downstream_rq_too_large{envoy_http_conn_manager_prefix="stats"} 0

In that case, to further troubleshoot the issue, it's recommended to send the HTTP requests directly to the backend service, bypassing Envoy, and see if the 413 errors are returned. If they are, that would be confirmation that they're not coming from Envoy.

Resolution

As per How do I configure flow control?, the suggested approach to remediate 413 responses is to increase the per_connection_buffer_limit_bytes value in the Contour/Envoy configuration.

All the available configuration variables for Contour and their default values are described in https://projectcontour.io/docs/1.29/configuration/, including the above. It also includes a Configuration Example that we can use as a guide to configure the variable in the contour-data-values.yaml configuration file used to install the Contour package.

Examples of how to configure contour-data-values.yaml configFileContents can be found in Contour Config File Contents. These are just examples and don't include all the available configuration variables, as described in the above Contour Docs. For example, per_connection_buffer_limit_bytes is not included, so we would need to refer to the https://projectcontour.io/docs/1.29/configuration/ Docs to see how to include it in the contour-data-values.yaml configFileContents.

An example of contour-data-values.yaml with per_connection_buffer_limit_bytes included would look as follows:

---
infrastructure_provider: vsphere
namespace: tanzu-system-ingress
contour:
configFileContents:
network:
num-trusted-hops: 2
# Envoy cluster settings.
cluster:
per-connection-buffer-limit-bytes: 32768 #<buffer-limit-value>
listener:
per-connection-buffer-limit-bytes: 32768 #<buffer-limit-value>
useProxyProtocol: false
replicas: 2
pspNames: "vmware-system-restricted"
logLevel: info
envoy:
service:
type: LoadBalancer
annotations: {}
externalTrafficPolicy: Cluster
disableWait: false
hostPorts:
enable: true
http: 80
https: 443
hostNetwork: false
terminationGracePeriodSeconds: 300
logLevel: info
certificates:
duration: 8760h
renewBefore: 360h

Once contour-data-values.yaml has been updated, we can update the Contour package with command:
# tanzu package installed update contour -n tkg-system --values-file contour-data-values_new.yaml

After this, check the Contour package is Reconciled:
# kubectl get pkgi,app -n tkg-system | grep contour

Then, restart the Contour and Envoy pods:
# kubectl rollout restart -n tanzu-system-ingress deployment.apps/contour
# kubectl rollout restart -n tanzu-system-ingress daemonset.apps/envoy

The new configFileContents configuration should be visible now in the Contour ConfigMap, under .data.contour.yaml:

# kubectl get cm -n tanzu-system-ingress contour -oyaml
apiVersion: v1
data:
contour.yaml: |
network:
num-trusted-hops: 2
cluster:
per-connection-buffer-limit-bytes: 32768
listener:
per-connection-buffer-limit-bytes: 32768
kind: ConfigMap
...