This is a known issue affecting VMware Integrated Openstack 6.x.
Workaround:
- Log in to the LCM node and log in to a mariadb pod:
kubectl -n openstack exec -it mariadb-server-0 bash
- Log into the database:
mysql --defaults-file=/etc/mysql/admin_user.cnf --host=localhost --connect-timeout 2
- Switch to the nova-api database and show it's contents:
use nova_api;
select * from cell_mappings;
Note: We would expect to see two rows in that table: one that has "cell0" in the "name" field and one that has "cell1" in the "name" field. What we're looking for here is a third cell that will likely have a NULL for it's "name" column and has values in the transport_url and database_connection columns that have the private VIP rather than pod hostnames (e.g. something like 'rabbitmq.openstack.svc.cluster.local' and 'mariadb.openstack.svc.cluster.local').
- Remove bad cell from the database table using these statements:
update host_mappings set cell_id = <id of cell1 from cell_mappings table> where cell_id = <id of NULL cell from cell_mappings table>;
update instance_mappings set cell_id = <id of cell1 from cell_mappings tables> where cell_id = <id of NULL cell from cell_mappings table>;
delete from cell_mappings where id = <id of NULL cell from cell_mappings table>;
- There is a retry limit before kubernetes will restart the pod. Delete the current pod to restart now:
osctl get pods | grep nova-db-sync
osdel pods <result from above>
- Verify that the pods run to completion without further error:
pods
Note: Above is an alias to kubectl get pods --all-namespaces --watch
- Run the following to check deployment is in running status.
watch -n2 'viocli get deployment'