HCX - Kafka Error "RecordTooLargeException" due to EULA in VM
search cancel

HCX - Kafka Error "RecordTooLargeException" due to EULA in VM

book

Article ID: 323358

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

This document is created as a reference for the HCX system and migration services recovery due to Kafka resource error.

Symptoms:
Customer may report a sudden impact in all type of Migration services along with Site Pair disconnection issue for a running HCX environment.
Below exceptions can be seen in the Web Engine logs very frequently:
2021-08-25 07:34:04.294 UTC [RemotingService_SvcThread-63879, Ent: HybridityAdmin, , TxId: 069f6457-572f-4251-b5f6-02b193ff81f0] WARN c.v.v.h.m.k.KafkaProducerDelegate- Publish failed and will retry 
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 2143910 bytes when serialized which is larger than 2097152, which is the value of the max.request.size configuration.
 at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:1307)
 at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:962)
 at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
 at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:750)
Location of Web Engine log:
  • HCX Manager : /common/log/admin/web.log


Cause

There are certain VMs which contain End User License Agreement (EULA) section associated with OVF.
During migration phase, HCX passes the OVF of the virtual machine along with VMDK files regardless of the migration type. We have Kafka in our HCX system acting as a messaging bus to enable such communications in the control/data plane.
The default limit of Kafka is 2MB, which can max out if we have huge messages pending on the bus. It may occur if OVF contains very large EULA section for a given VM.
Due to huge messages in the queue, Kafka won't be able to handle and throw RecordTooLargeException error.
As a result, rest all other messages will be put in the queue and nothing progresses.


Note:- EULA generally comes from a template of a VM which customer is trying to deploy. There are different type of template providers with having different agreements. So it is expected to have EULA files in any language/format with having different sizes also.

Resolution

None at the moment.
Note:- Please contact GSS/TSE in case of any persistent failures after attempting the workaround.

Workaround:
Customer is advised to check OVF section of the VM prior to any migration and remove EULA if applicable and add them back post successful migration.

Below steps can be performed for removing the EULA from a given VM:
Note:- Please take backup of the VM before performing these steps.

1. Check OVF section of the VM.

a. Enable vAPP Options:
Go to VM >> Configure >> vApp Options >> Edit vApp Options >> Enable vApp Options

image.png

image.png

2. Check EULA section.

a. Login to the MoB (Managed Object Reference) interface of the source vCenter Server.
https://<vCenter_IP_URL>/mob/?moid=vm-<VM-ID>

Config
image.png

vAppConfig
image.png

EULA String
image.png

3. Remove EULA section.

a. Scroll down to find the methods that can be called on the VM object, find following method and click. A new window should pop-up.
void | ReconfigVM_Task
image.png

 b. In the new pop-up window under “Parameters > Value” text field, select the entire payload and replace with below payload and click on “Invoke Method”.
<spec>
 <vAppConfig>
  <eula>"TEXT"</eula>
 </vAppConfig>
</spec>


image.png

4. Initiate the migration through HCX.
5. Once migration is complete, customer can choose to have the EULA section pushed again to the VM using the ReconfigVM_Task performed on MoB interface at the target vCenter Server.

Additional Information

Please also be aware of https://ikb.vmware.com/s/article/88233 where special characters in the VM Notes may also cause the same Kafka RecordTooLargeException


Impact/Risks:
  • All type of Migration services will be impacted.
  • It may impact Service Mesh redeployment. However, it won't impact existing service mesh.
  • It can also cause Site Pairing disconnection.
  • It may cause instability to HCX hybridity UI page.
Note:- This can be considered as a rare use case.