When running an Upgrade Automation (UA) based Portal Topology upgrade we're seeing failures. We use the following command.
ansible-playbook -i inventory -K topology-upgrade.yaml
When running the following command we're generating additional logging to the CLI (-vvv), and redirecting it to a log file we're calling upgrade.txt (> upgrade.txt).
ansible-playbook -vvv -i inventory -K topology-upgrade.yaml > upgrade.txt
We see message entries like the following in the resulting upgrade.txt file.
TASK [Fail if Kafka is not accessible after retries] ***************************
task path: /opt/CA/installer/NetOps-Topology-25.4.4-Linux-RELEASE/provisioning/roles/netops_kafka/tasks/kafka_topics_update_tasks.yaml:46
[ERROR]: Task failed: Action failed: Kafka broker is not accessible at <Kafka_Host_IP>:9092 after 10 retries. Please ensure Kafka is running.
Origin: /opt/CA/installer/NetOps-Topology-25.4.4-Linux-RELEASE/provisioning/roles/netops_kafka/tasks/kafka_topics_update_tasks.yaml:46:3
When we examine the kafka server to validate the correct processes are running we find the expected two Kafka processes running. We see both netops-kafka and netops-kafka-zookeeper as active and running.
When we examine what is tied to the kafka port 9092 we see it tied to the loopback IP instead of the servers real IP address.
[root@<KafkaHost> ~]# netstat -lntup | grep 9092
tcp6 0 0 127.0.0.1:9092 :::* LISTEN 406670/java
All supported Network Observability DX NetOps Portal Topology installations
Incorrectly configured /etc/hosts file on the kafka server host. The localhost line was found configured as follows.
Reset the /etc/host file so that the file contains a loopback entry as follows:
Edit the file. Save the changes.
Once completed stop and restart the kafka services.
Confirm kafka port 9092 is using the correct real IP for the host.
Rerun the topology upgrade.