We use a Kafka connector on our gateways to produce log messages to an Azure Event Hubs instance. An online resize of the Event Hub service was performed to give it more processing power and its partition number also went up, triggering the following errors in our ssg logs:
2025-06-05T02:06:06.641-0300 WARNING 83384 org.apache.kafka.clients.producer.internals.Sender: [Producer clientId=l7kafka:producer:PRODUCER Azure Log Event Hub PRD:2127975773:5EDITED851fbb85592fc4EDITEDba26d7] Got error produce response with correlation id 2872 on topic-partition log_apim_02-43, retrying (0 attempts left). Error: REPLICA_NOT_AVAILABLE
2025-06-05T02:06:06.641-0300 WARNING 83384 org.apache.kafka.clients.producer.internals.Sender: [Producer clientId=l7kafka:producer:PRODUCER Azure Log Digital Event Hub PRD:2127975773:5EDITED851fbb85592fc4EDITEDba26d7] Received invalid metadata error in produce request on partition log_apim_02-43 due to org.apache.kafka.common.errors.ReplicaNotAvailableException: The replica is not available for the requested topic-partition. Produce/Fetch requests and other requests intended only for the leader or follower return NOT_LEADER_OR_FOLLOWER if the broker is not a replica of the topic-partition.. Going to request metadata update now
2025-06-05T02:06:06.799-0300 WARNING 83384 org.apache.kafka.clients.producer.internals.Sender: [Producer clientId=l7kafka:producer:PRODUCER Azure Log Digital Event Hub PRD:22127975773:5EDITED851fbb85592fc4EDITEDba26d7] Received invalid metadata error in produce request on partition log_apim_02-43 due to org.apache.kafka.common.errors.ReplicaNotAvailableException: The replica is not available for the requested topic-partition. Produce/Fetch requests and other requests intended only for the leader or follower return NOT_LEADER_OR_FOLLOWER if the broker is not a replica of the topic-partition.. Going to request metadata update now
We tried to do a "disable/enable" of the connector without success and everything went back to normal only after restarting our SSG instances.
Is this the expected behavior or if is there any kind of configuration on the gateway side to make this connector more resilent?
Currently the Kafka producer options we set are:
max.block.ms=5000
metadata.max.age.ms=1000
producer.sasl.mechanism=PLAIN
producer.security.protocol=SASL_SSL
CA API Gateway 11.x
The Gateway needs to be restarted in cases like MySQL file sizing, new certificates, installing a patch, etc.
An activity like the Event Hub partition resizing would require a restart of the gateway.
Also, Microsoft has informed that clients need to be restarted because event hubs are not really Kafka servers, so they don't implement a lot of the Kafka features in their listener.