VIO upgrade fails on nova-db-sync pod with Can't connect to MySQL server

Products

VMware VMware Integrated OpenStack

Issue/Introduction

Symptoms:

Upgrading from VMware Integrated Openstack 5.1 to 6.x or 7.x
nova-db-sync pod failing with error

2020-05-15T09:38:28Z 2020-05-15 09:38:28.929 50 WARNING oslo_db.sqlalchemy.engines [req-d6bce048-37d6-408f-83e6-7e7ca387c282 - - - - -] SQL connection failed. -1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on u'<private VIP>' ([Errno 111] ECONNREFUSED)") (Background on this error at: http://sqlalche.me/e/e3q8)

Deployment has been upgraded since before VIO 4.x.

Environment

VMware Integrated OpenStack 6.x
VMware Integrated Openstack 7.x

Cause

The cell_mappings table in the Nova API DB had three rows instead of the expected two, and one of them had a NULL for the "name" field (the other two were named "cell0" and "cell1" as expected).

Resolution

This is a known issue affecting VMware Integrated Openstack 6.x.

Workaround:

Log in to the LCM node and log in to a mariadb pod:

kubectl -n openstack exec -it mariadb-server-0 bash

Log into the database:

mysql --defaults-file=/etc/mysql/admin_user.cnf --host=localhost --connect-timeout 2

Switch to the nova-api database and show it's contents:

use nova_api;
select * from cell_mappings;

Note: We would expect to see two rows in that table: one that has "cell0" in the "name" field and one that has "cell1" in the "name" field. What we're looking for here is a third cell that will likely have a NULL for it's "name" column and has values in the transport_url and database_connection columns that have the private VIP rather than pod hostnames (e.g. something like 'rabbitmq.openstack.svc.cluster.local' and 'mariadb.openstack.svc.cluster.local').

Remove bad cell from the database table using these statements:

update host_mappings set cell_id = <id of cell1 from cell_mappings table> where cell_id = <id of NULL cell from cell_mappings table>;
update instance_mappings set cell_id = <id of cell1 from cell_mappings tables> where cell_id = <id of NULL cell from cell_mappings table>;
delete from cell_mappings where id = <id of NULL cell from cell_mappings table>;

There is a retry limit before kubernetes will restart the pod. Delete the current pod to restart now:

osctl get pods | grep nova-db-sync
osdel pods <result from above>

Verify that the pods run to completion without further error:

pods
Note: Above is an alias to kubectl get pods --all-namespaces --watch

Run the following to check deployment is in running status.

watch -n2 'viocli get deployment'

Additional Information

Note: Editing this field on the VIO 5.1 database could also be done prior to doing any upgrade task. As with any database update, please take or validate that you have valid backup prior to making any updates to the database.