Lifecycle cluster operations in Tanzu Mission Control Self-Managed, such as deploying, deleting, or modifying clusters, may become stuck in a Pending state within the TMC UI.
Investigation of the TMC-SM management cluster pods reveals that the Kafka broker is in a CrashLoopBackOff state, displaying the following error in the logs:
org.apache.kafka.common.KafkaException: Failed to acquire lock on file .lock in /bitnami/kafka/data. A Kafka instance in another process or thread is using this directory.
This failure prevents the platform from processing task status updates or state changes across the environment.
The underlying cause is a stale filesystem lock on the Kafka data directory within the TMC-SM management cluster.
This might occur following an ungraceful shutdown (e.g., node failure or abrupt pod restart), which leaves a .lock file behind on the persistent volume. When the Kafka process attempts to restart, it detects this file and fails to start to prevent potential data corruption. As a result, lifecycle cluster operations cannot move past the "Pending" phase because the backend cannot acknowledge the completion of tasks.
To resolve this issue, the stale lock file must be manually removed from the Kafka persistent volume to allow the broker to start and resume processing lifecycle tasks.
kubectl get pods -A | grep kafkakubectl scale statefulset <kafka-statefulset-name> -n <tmc-namespace> --replicas=0rm /bitnami/kafka/data/.lockkubectl scale statefulset <kafka-statefulset-name> -n <tmc-namespace> --replicas=<previous count>