This document is created as a reference for the HCX system and migration services recovery due to Kafka resource error.
Customer may report a sudden impact in all type of Migration services along with Site Pair disconnection issue for a running HCX environment.
Below exceptions can be seen in the Web Engine logs very frequently:
/common/logs/admin/web.log
2021-08-25 07:34:04.294 UTC [RemotingService_SvcThread-63879, Ent: HybridityAdmin, , TxId: 069f6457-572f-4251-b5f6-02b193ff81f0] WARN c.v.v.h.m.k.KafkaProducerDelegate- Publish failed and will retry
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 2143910 bytes when serialized which is larger than 2097152, which is the value of the max.request.size configuration.
VMware HCX
There are certain VMs which contain End User License Agreement (EULA) section associated with OVF.
During migration phase, HCX passes the OVF of the virtual machine along with VMDK files regardless of the migration type. We have Kafka in our HCX system acting as a messaging bus to enable such communications in the control/data plane.
The default limit of Kafka is 2MB, which can max out if we have huge messages pending on the bus. It may occur if OVF contains very large EULA section for a given VM.
Due to huge messages in the queue, Kafka won't be able to handle and throw RecordTooLargeException
error.
As a result, rest all other messages will be put in the queue and nothing progresses.
Note:- EULA generally comes from a template of a VM which customer is trying to deploy. There are different type of template providers with having different agreements. So it is expected to have EULA files in any language/format with having different sizes also.
None at the moment.
Note:- Please contact GSS/TSE in case of any persistent failures after attempting the workaround.
Workaround:
Customer is advised to check OVF section of the VM prior to any migration and remove EULA if applicable and add them back post successful migration.
Below steps can be performed for removing the EULA from a given VM:
Note:- Please take backup of the VM before performing these steps.
1. Check OVF section of the VM.
a. Enable vAPP Options:
Go to VM >> Configure >> vApp Options >> Edit vApp Options >> Enable vApp Options
2. Check EULA section.
a. Login to the MoB (Managed Object Reference) interface of the source vCenter Server.https://<vCenter_IP_URL>/mob/?moid=vm-<VM-ID>
Config
vAppConfig
EULA String
3. Remove EULA section.
a. Scroll down to find the methods that can be called on the VM object, find following method and click. A new window should pop-up.void | ReconfigVM_Task
b. In the new pop-up window under “Parameters > Value” text field, select the entire payload and replace with below payload and click on “Invoke Method”.
<spec>
<vAppConfig>
<eula>"TEXT"</eula>
</vAppConfig>
</spec>
4. Initiate the migration through HCX.
5. Once migration is complete, customer can choose to have the EULA section pushed again to the VM using the ReconfigVM_Task
performed on MoB interface at the target vCenter Server.
Please also be aware of KB323360 where special characters in the VM Notes may also cause the same Kafka RecordTooLargeException
Impact/Risks:
Note:- This can be considered as a rare use case.