This resource is to inform about failure associated with HCX Service Mesh Fleet appliances configuration and how to recover that.
Symptoms:
During upgrade/redeploy/resync OR any other Configuration operation performed on a given Service Mesh (SM) containing more than 13 Fleet Appliances including (IX/NE/WO/SGW/SDR), below exceptions will be thrown in the log and the appliance lifecycle workflow won't be executed further:
2022-08-01 21:05:17.106 UTC [RemotingService_SvcThread-32926, Ent: HybridityAdmin, , TxId: ########-####-####-####-########b813] WARN c.v.v.h.m.k.KafkaProducerDelegate- Publish failed and will retry java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
Location of App Engine log:
This behavior is expected as per Kafka messaging size allocated per SM in the backend and the recommendation is to follow HCX Service Appliance Provisioning Concurrency limit specified in Config Max Guide.
Workaround:
For greenfield/brownfield deployments, If user wants to increase the number of Network Extension (NE) appliance per SM depending upon their number of PG/extensions requirement, then suggestion is below:
For brownfield specific deployments, where a given SM contains already more than 13 appliances and if NE appliance count itself is (>10), we can follow below steps:
IX - Interconnect Appliance WO - WAN Optimization Appliance NE - Network Extension Appliance SGW - Sentinel Gateway Appliance at Source for OSAM SDR - Sentinel Receiver Appliance at Target for OSAMThis will impact all SM appliance lifecycle job including Upgrade/Redeploy & Resync operation against any configuration changes like MTU modifications under Network Profile or some changes in Compute Profile parameters etc.