Mariadb cluster is not healthy
search cancel

Mariadb cluster is not healthy

book

Article ID: 413633

calendar_today

Updated On:

Products

VMware Integrated OpenStack

Issue/Introduction

  • Deployment status must be Running in 'viocli get deployment'
  • 'viocli check health' reports:

    +-------------------+-----------+--------------------------------+-------------------------+
    |       NAME        |  RESULT   |             ALARM              |         SKIPPED         |
    +-------------------+-----------+--------------------------------+-------------------------+
    | mariadb           | Alarms:1  | mariadb cluster                |                         |
    |                   | Passed:1  | wsrep_cluster_size != 3        |                         |
    +-------------------+-----------+--------------------------------+-------------------------+

  • Running the following command on vio-manager returns similar to example:
    #for server in `seq 0 2`;do osctl exec -ti mariadb-server-$server -- mysql --defaults-file=/etc/mysql/admin_user.cnf --host=localhost -e "show status;";done |grep -e wsrep_cluster_size -e wsrep_last_committed

    | wsrep_cluster_size                                           | 1                                                                 |
    | wsrep_last_committed                                         | 504496                                                            |
    | wsrep_cluster_size                                           | 2                                                                 |
    | wsrep_last_committed                                         | 511103                                                            |
    | wsrep_cluster_size                                           | 2                                                                 |
    | wsrep_last_committed                                         | 511104                                                            |

Environment

7.x

Cause

In the sample output, the DB cluster has split into two partitions: one with mariadb-server-0, the other two with mariadb-server-1 and mariadb-server-2. The wsrep_last_committed has significant difference between the two partitions.

The DB partition must satisfy all the following conditions:

  1. There is one node with wsrep_cluster_size=1 and the other two nodes with wsrep_cluster_size=2
  2. wsrep_last_committed of the node with wsrep_cluster_size=1 is significantly different than those of the two nodes with wsrep_cluster_size=2
  3. The two nodes with wsrep_cluster_size=2 has roughly the same wsrep_last_committed.

Resolution

Note:  Before proceeding make sure we have a current backup.  See backup See page 160.

1. Identify the reboot node

#osctl get cm mariadb1-mariadb-state -oyaml

apiVersion: v1
data:
  safe_to_bootstrap.mariadb-server-0: "1"
  safe_to_bootstrap.mariadb-server-1: "0"
  safe_to_bootstrap.mariadb-server-2: "0"
  sample_time.mariadb-server-0: "2022-04-06T12:48:33.518314Z"
  sample_time.mariadb-server-1: "2022-04-06T12:48:31.782621Z"
  sample_time.mariadb-server-2: "2022-04-06T12:48:26.710797Z"
  seqno.mariadb-server-0: "-1"
  seqno.mariadb-server-1: "-1"
  seqno.mariadb-server-2: "-1"
  uuid.mariadb-server-0: ########-####-####-####-############
  uuid.mariadb-server-1: ########-####-####-####-############
  uuid.mariadb-server-2: ########-####-####-####-############
  version.mariadb-server-0: "2.1"
  version.mariadb-server-1: "2.1"
  version.mariadb-server-2: "2.1"
kind: ConfigMap
metadata:
  annotations:
    openstackhelm.openstack.org/cluster.state: reboot
    openstackhelm.openstack.org/leader.expiry: "2022-04-06T12:49:54.353900Z"
    openstackhelm.openstack.org/leader.node: mariadb-server-0
    openstackhelm.openstack.org/reboot.node: mariadb-server-0
  creationTimestamp: "2022-04-02T04:11:57Z"
  name: mariadb1-mariadb-state
  namespace: openstack
  resourceVersion: "#######"
  selfLink: /api/v1/namespaces/openstack/configmaps/mariadb1-mariadb-state
  uid: ########-####-####-####-############

Note:  In the sample output, reboot node is mariadb-server-0.  It must also have annotation: openstackhelm.openstack.org/cluster.state=reboot.  The reboot node must be the one in the one-node partition while the other two nodes form the two-node partition.

2. Fix DB Partition

  1. If the one-node partition has larger wsrep_last_committed than the two-node partition, the issue can be fixed by:

    #osdel pod mariadb-server-x & osdel pod mariadb-server-y &

    Replace x and y with the nodes in the two-node partition.

    Note: it is necessary to delete the two pods at the same time. Do NOT delete the two pods one by one.

  2. If the one-node partition has lower wsrep_last_committed than the two-node partition, the issue can be fixed by:

    #osctl annotate --overwrite cm mariadb1-mariadb-state openstackhelm.openstack.org/cluster.state='live'
    #osctl annotate --overwrite cm mariadb1-mariadb-state openstackhelm.openstack.org/leader.node='mariadb-server-x'

    Note:  Replace x with any node in the two-node partition.

  3. Run the following:

    #osdel pod mariadb-server-y

    Note: Replace y with the number for the reboot node.