Remote Jobs on TCA-CP aren't getting updated on TCA-M
search cancel

Remote Jobs on TCA-CP aren't getting updated on TCA-M

book

Article ID: 379182

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • Operations like VIM status are stuck in Pending.

  • TCA-CP Status in Virtual Infrastructure will show pending
  • Cluster Status showing Processing 
  • On the TCA-M (2.3) /common/logs/admin/app.log, below logs are observed.
2024-10-04 03:12:15.873 UTC [RemotingService_SvcThread-3, Ent: HybridityAdmin, Usr: HybridityAdmin, , TxId: ###########-#####-#####-#####-############] WARN  c.v.v.h.m.k.KafkaProducerDelegate- Publish failed and will retry
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 5363784 bytes when serialized which is larger than 2097152, which is the value of the max.request.size configuration.
        at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:1316)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:985)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:885)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:773)
        at com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.sendMessageWithRetries(KafkaProducerDelegate.java:214)
        at com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.publishMessageWithTransaction(KafkaProducerDelegate.java:191)
        at com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.publish(KafkaProducerDelegate.java:155)
        at com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.publish(KafkaProducerDelegate.java:149)
        at com.vmware.vchs.hybridity.messaging.adapter.JobManagerJobPublisher.publish(JobManagerJobPublisher.java:112)
        at com.vmware.vchs.hybridity.messaging.adapter.JobManager.queueJob(JobManager.java:1688)
        at com.vmware.vchs.hybridity.service.remoting.jobs.JobStatusPollAndNotify.handleJobsFromNewVersion(JobStatusPollAndNotify.java:695)
        at com.vmware.vchs.hybridity.service.remoting.jobs.JobStatusPollAndNotify.retrieveUpdatesFromRemoteSinceLastRequest(JobStatusPollAndNotify.java:571)
  • On the TCA-M (3.2) app log pod will show this:
    stdout F Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is ####### bytes when serialized which is larger than #######, which is the value of the max.request.size configuration.

 

Environment

2.3

3.2

Cause

This happens as the topmost record on "RemotingOutbox" on TCA-CP wasn't getting consumed by TCA-M because of the kafka limit of 2MB and the record was more than 5MB. The subsequent updates after that were stuck.

Resolution

 Follow the below steps to delete the job:

    1. Take a backup/snapshot of TCA-Manager and TCA-CP

    2. SSH to the corresponding TCA-CP

    3. Connected the Postgres
       connect-to-postgres

    4. Check the topmost record by the below query:
       >> SELECT val->'job'->>'jobType', "creationDate", "lastUpdated" FROM "RemotingOutbox" ORDER BY "lastUpdated";

    5. Clean up the record:
       >> DELETE FROM "RemotingOutbox" WHERE val->'job'->>'jobType'='<JobType returned in above query>';

       

    6. Dummy edit the cluster that is showing Processing

Additional Information

To figure out the issue SSH to the target TCA-CP and check the postgres.

  >> SELECT val->'job'->>'jobType', "creationDate", "lastUpdated" FROM "RemotingOutbox" ORDER BY "lastUpdated";

and check the entries: .

  >> SELECT count(*) FROM "RemotingOutbox" ;


Returning around 700+ records means that the remote Jobs aren't getting updated on TCA-M.