When deploying Malware Prevention Service VMs which are hosted within the NAPP, sometimes the deployment fails with an error seen in NSX.
In the vCenter UI, we see that the OVF deployment task gets cancelled automatically with a message saying "The task was canceled by a user."
In NAPP, we see that the projectcontour-envoy pods are getting restarted and we see OOMKilled as the reason for termination.
NAPP 4.2.0 and 4.2.0.1 when used with vCenter 8 update 3e and later.
NSX 9.0 + VC 9.0
Starting with NAPP 4.2.0, Malware Prevention Service (MPS) VM Images are hosted in the NAPP repository itself. When deploying MPS Service VMs, the bits are downloaded from the NAPP (which acts as the server for the files) to the vCenter's EAM (which is the client in this scenario) and then deployed to the required ESXi hosts in the environment.
In this workflow, the communication between NAPP and vCenter happens between the proxies at both sides, which is envoy in this case.
Due to configuration changes in the vCenter envoy proxy, starting from vCenter 8 update 3e, we see that http2 is the default protocol being negotiated between the two proxies.
HTTP/2 has larger default memory requirements due to 256MB default buffer allocations.
Due to this we see that, on the NAPP side, the proxy buffers a lot of data while serving the MPS SVM bits and the envoy pods get OOMKilled.
The following indicators can be used to identify this issue:
2025-05-05T06:44:22.542Z | ERROR | VM-push-dispatcher-16 | UploadConnection.java | 202 | Upload failed.org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 1,917,239,296; received: 289,759,250)
napp-k get pods -n projectcontour
NAME READY STATUS RESTARTS AGE
projectcontour-envoy-ck4t4 2/2 Running 1 (2d16h ago) 6d23h
napp-k describe pod projectcontour-envoy-ck4t4 -n projectcontour
State: Running
Started: Fri, 02 May 2025 16:30:13 +0000
Last State: Terminated
Reason: OOMKilled <----------- OOMKilled seen as the reason for last restart
Exit Code: 1
Started: Thu, 01 May 2025 01:42:50 +0000
Finished: Fri, 02 May 2025 16:30:12 +0000
To resolve this issue, we need to provide a few additional configuration options for the envoy proxy in NAPP to optimize service large files and also allocate a bigger memory resources for the pods.
Also note that in order not to overburden the NAPP proxy, SVM deployment should be triggered only on one cluster at a time.
The below edits are required in NAPP proxy configurations (projectcontour)
projectcontour.*** Append the Yellow Text in the below projectcontour configmap to add the cluster and listener configurations ***
Add the cluster and listener configurations to the data→contour.yaml section
root@nsx-mgr-0:~# napp-k edit configmap -n projectcontour projectcontour -o yamlapiVersion:v1data:contour.yaml:|-accesslog-format:envoycluster: <------- Add cluster and listener configurationsper-connection-buffer-limit-bytes:65536listener:http2-max-concurrent-streams:100per-connection-buffer-limit-bytes:65536disablePermitInsecure:falsetls:envoy-client-certificate:...
...
2. Edit the project contour deployment
Deployment YAML
*** Increase the memory to 96Mi highlighted in Yellow ***
Update the memory to 96Mi to the spec->template->spec→resources->requests section
root@nsx-mgr-0:~# napp-k edit deployment -n projectcontour projectcontour-contour -o yamlapiVersion:apps/v1kind:Deploymentmetadata:...spec:...template:...spec:...resources:limits:memory:256Mirequests:cpu:40mmemory:96Mi <----- Increase the memory to 96Mi......
...
3.Update the memory allocations mentioned in the daemonset for the projectcontour-envoy pods
*** Update and append highlighted in Yellow in the below yaml ***
Update the spec->template->spec->args (where command is 'envoy') ->resources section as belowUpdate the spec->template->spec->args (where command is 'contour') ->resources section as below- --overload-max-heap=335544320' argument to args where command is 'contour'root@nsx-mgr-0:~# napp-k edit daemonset -n projectcontour projectcontour-envoy -o yamlapiVersion:apps/v1kind:DaemonSet
...spec:...template:...spec:affinity:{}automountServiceAccountToken:falsecontainers:...-args:--c-/config/envoy.json---service-cluster $(CONTOUR_NAMESPACE)---service-node $(ENVOY_POD_NAME)---log-level infocommand:-envoy...resources:limits:memory:500Mi <---- Increase memory and CPU resources allocatedrequests:cpu:200mmemory:300Mi...initContainers:-args:-bootstrap-/config/envoy.json---xds-address=some-address---xds-port=some-port---resources-dir=/config/resources---envoy-cafile=/certificate/sample-ca.crt---envoy-cert-file=/certificate/sample.crt---envoy-key-file=/certificate/sample.key---overload-max-heap=335544320 <----- provide this additional argumentcommand:-contour...resources:limits:memory:500Mi <---- Increase memory and CPU resources allocatedrequests:cpu:200mmemory:300Mi...