After a cluster upgrade, the Confluent is now failing to come back up with the following:
WARN The replication factor of the schema topic _schemas is less than the desired one of 3. If this is a production environment, it's crucial to add more brokers and increase the replication factor of the topic. (io.confluent.kafka.schemaregistry.storage.KafkaStore:263) ERROR The retention policy of the schema topic _schemas is incorrect. You must configure the topic to 'compact' cleanup policy to avoid Kafka deleting your schemas after a week. Refer to Kafka documentation for more details on cleanup policies (io.confluent.kafka.schemaregistry.storage.KafkaStore:279) ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:81)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:411) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:79) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:105) at io.confluent.rest.Application.configureHandler(Application.java:324) at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:228) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:44)Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: The retention policy of the schema topic _schemas is incorrect. Expected cleanup.policy to be 'compact' but it is delete at io.confluent.kafka.schemaregistry.storage.KafkaStore.verifySchemaTopic(KafkaStore.java:284) at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:185) at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:122) at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:409)
Watchtower 1.2, 1.3
New metrics from SYSVIEW were enabled, these new metrics caused a flood of data records overwhelming Watchtower. To resolve and clean up the lag, it was recommended to clean up the Kafka PVC. The steps that were followed are listed below and caused the _schemas topic to automatically change to the wrong cleanup policy.
As long as the Confluent schema registry was up, it would work, but after a rescaling (the cluster upgrade event), the issue happened.
All other deployments could have been down. If the Kafka and Confluent schema registry sequence is as above, then the issue will occur irrespective of other Kafka-dependent deployments
kubectl scale -n NAMESPACE deploy $(kubectl get deploy|grep "confluent-deployment\|data-insights-dbloader\|data-insights-ingestor\|datastream-hub-deployment\|datastream-maas-deployment\|ml-insights-profiler-alarm-manager\|ml-insights-profiler-notifier"|awk -F ' ' '{print $1}') --replicas=0
kubectl scale -n NAMESPACE sts $(kubectl get sts|grep ml-insights-profiler-ade|awk -F ' ' '{print $1}') --replicas=0
kubectl apply -n NAMESPACE -f fix-confluent.yaml
log4j:ERROR Could not read configuration file from URL [file:/tmp/data/config/log4j.properties].
java.io.FileNotFoundException: /tmp/data/config/log4j.properties (No such file or directory)
.
.
.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Completed updating config for topic _schemas.
kubectl scale -n NAMESPACE deploy $(kubectl get deploy|grep "confluent-deployment\|data-insights-ingestor\|datastream-hub-deployment\|datastream-maas-deployment\|ml-insights-profiler-alarm-manager\|ml-insights-profiler-notifier"|awk -F ' ' '{print $1}') --replicas=1
kubectl scale -n NAMESPACE deploy $(kubectl get deploy|grep "data-insights-dbloader"|awk -F ' ' '{print $1}') --replicas=3
kubectl scale -n NAMESPACE sts $(kubectl get sts|grep ml-insights-profiler-ade|awk -F ' ' '{print $1}') --replicas=1
kubectl scale -n NAMESPACE deploy $(kubectl get deploy|grep "confluent-deployment\|data-insights-dbloader\|data-insights-ingestor\|datastream-hub-deployment\|datastream-maas-deployment\|ml-insights-profiler-alarm-manager\|ml-insights-profiler-notifier"|awk -F ' ' '{print $1}') --replicas=0
kubectl scale -n NAMESPACE sts $(kubectl get sts|grep "kafka\|ml-insights-profiler-ade"|awk -F ' ' '{print $1}') --replicas=0
kubectl scale -n NAMESPACE delete pvc common-service-kafka-pvc-kafka-0
kubectl scale -n NAMESPACE sts $(kubectl get sts|grep kafka|awk -F ' ' '{print $1}') --replicas=1
kubectl scale -n NAMESPACE deploy $(kubectl get deploy|grep "confluent-deployment\|data-insights-ingestor\|datastream-hub-deployment\|datastream-maas-deployment\|ml-insights-profiler-alarm-manager\|ml-insights-profiler-notifier"|awk -F ' ' '{print $1}') --replicas=1
kubectl scale -n NAMESPACE deploy $(kubectl get deploy|grep "data-insights-dbloader"|awk -F ' ' '{print $1}') --replicas=3
kubectl scale -n NAMESPACE sts $(kubectl get sts|grep ml-insights-profiler-ade|awk -F ' ' '{print $1}') --replicas=1