Upgrade Healthwatch might fail with error <Duplicate entry 'xxxx' for key 'alert_configuration_id'>
search cancel

Upgrade Healthwatch might fail with error <Duplicate entry 'xxxx' for key 'alert_configuration_id'>

book

Article ID: 292934

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Symptoms:
Upgrading Healthwatch from version 1.3.x to 1.4.x might fail with the following error for the errand “push-apps”.
 

<Duplicate entry '#####' for key 'alert_configuration_id'>

This error is shown in the example below:

[1m2019-02-12 07:23:10[m [1;31mERROR[m [2mcommand.DbMigrate[m [elastic-3] Migration of schema `platform_monitoring` to version 1401 - v1 4 migrate alert to alert status failed! Please restore backups and roll back database and code!  
           [1m2019-02-12 07:23:10[m [1;31mERROR[m [2mpushapps.DatabaseMigrator[m [elastic-3] Encountered error: org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException:   
           Migration V1401__v1_4_migrate_alert_to_alert_status.sql failed  
           --------------------------------------------------------------  
           SQL State  : 23000  
           Error Code : 1062  
           Message    : (conn=2####4) Duplicate entry '71-cf-healthwatch-forwarder-60db4532-###-####-####-c29e3a065e9b' for key 'alert_configuration_id'  
           Location   : /var/vcap/jobs/push-apps/packages/healthwatch-data/dbmigrations/V1401__v1_4_migrate_alert_to_alert_status.sql (/var/vcap/jobs/push-apps/packages/healthwatch-data/dbmigrations/V1401__v1_4_migrate_alert_to_alert_status.sql)  
           Line       : 3  
           Statement  : insert into alert_status(alert_configuration_id, deployment, job, envelope_index, status, transitioned_at_seconds)  
           select a1.alert_configuration_id,  
                  a1.deployment,  
                  a1.job,  
                  a1.envelope_index,  
                  a1.status,  
                  a1.transitioned_at_seconds  
             from alert a1  
             JOIN(  
             SELECT  
               deployment,  
               envelope_index,  
               job,  
               alert_configuration_id,  
               MAX(transitioned_at_seconds) AS transitioned_at_seconds  
             FROM alert  
             GROUP BY alert_configuration_id, deployment, job, envelope_index  
           ) as a2  
             on a1.alert_configuration_id = a2.alert_configuration_id  
               and a1.deployment = a2.deployment  
               and a1.job = a2.job  
               and a1.envelope_index = a2.envelope_index  
               and a1.transitioned_at_seconds = a2.transitioned_at_seconds  
            with message   
           Migration V1401__v1_4_migrate_alert_to_alert_status.sql failed

Environment

 

Cause

Since Healthwatch v1.4, there has been a table called "alert_status". Data for this new table will be populated from the existing table called "alert" with a unique key formed by the following columns: alert_configuration_id, deployment, job, envelope_index.

The query to retrieve the data is shown below:
select a1.alert_configuration_id,  
                  a1.deployment,  
                  a1.job,  
                  a1.envelope_index,  
                  a1.status,  
                  a1.transitioned_at_seconds  
             from alert a1  
             JOIN(  
             SELECT  
               deployment,  
               envelope_index,  
               job,  
               alert_configuration_id,  
               MAX(transitioned_at_seconds) AS transitioned_at_seconds  
             FROM alert  
             GROUP BY alert_configuration_id, deployment, job, envelope_index  
           ) as a2  
             on a1.alert_configuration_id = a2.alert_configuration_id  
               and a1.deployment = a2.deployment  
               and a1.job = a2.job  
               and a1.envelope_index = a2.envelope_index  
               and a1.transitioned_at_seconds = a2.transitioned_at_seconds

If this query returns multiple entries with the same alert_configuration_id, deployment, job, envelope_index, this error will occur.

Since "alert_status" is a new table in Healthwatch v1.4, it is empty the first time the Healthwatch tile is upgraded from v1.3 to v1.4. If errand “push-apps” is run twice, it will fail with the error for the second run.

Resolution

If losing the Healthwatch data history is not a concern, then the best solution is to remove the existing Healthwatch tile and re-install it again with the new 1.4 release.

However, if there is a need to keep the data history, then follow the steps below to check for duplicate entries in the Healthwatch database. Remove some to make them (alert_configuration_id, deployment, job, envelope_index) unique in the result of the above query.
Note: This will cause some data loss on the "alert" table, but most of them will be kept.   

1. Check out the database instance of Healthwatch deployment, ssh into the instance, and change to root user:
 

$ bosh -d p-healthwatch-a#####b vms
Using environment '10.###.###.11' as client 'ops_manager'

Task 17167. Done

Deployment 'p-healthwatch-a0619872######c0723ab'

Instance                                                    Process State  AZ   IPs            VM CID                                   VM Type  Active
healthwatch-forwarder/27bad#####e2a9  running        az1  10.###.###.79  vm-57f6a03c-###-###-####-e2fff7e0c42d  xlarge   true
healthwatch-forwarder/d4875ec0-6696-414e-8ba1-77481ff7c1ef  running        az1  10.###.###.78  vm-5629d079-####-###-####-a4a467a95bc4  xlarge   true
mysql/d152cb41-07f1-4b09-93aa-3053d821082b                  running        az1  10.###.###.81  vm-df2f34c1-####-####-####-d2d42c573c94  2xlarge  true
redis/d10b7b52-###-####-9f95-24e2d1f1ef0f                  running        az1  10.###.###.80  vm-92746af4-####-####-####-f2f063922918  xlarge   true

4 vms

$ bosh -d p-healthwatch-a06#####3ab ssh mysql/0
$ sudo -i
mysql/d152cb41-####-####-###-3053d821082b:~#

2. Connect to the MySQL database and switch to database "platform_monitoring"

# mysql --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 40093
Server version: 5.7.22-22-29.26-log MySQL Community Server (GPL), wsrep_29.26

Copyright (c) 2009-2018 Percona LLC and/or its affiliates
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use platform_monitoring;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql>
Note: If the command fails with something like "ERROR 2002 (HY000): Can't connect to local MySQL server through socket ......", try the command "mysql --defaults-file=/var/vcap/jobs/mysql/config/mylogin.cnf"


3. Make a backup of table "alert"

mysql> create table alert1 as select * from alert;


4. Run the following query to retrieve duplicate records

mysql> select max(id) as id, alert_configuration_id, deployment, envelope_index, job, count(*) as count from alert  group by alert_configuration_id, deployment, job, envelope_index having count>1;

For example, duplicated entries are returned as shown below:

mysql> select max(id) as id, alert_configuration_id, deployment, envelope_index, job, count(*) as count from alert  group by alert_configuration_id, deployment, job, envelope_index having count>1;
+-------+------------------------+-------------------------------------------------------+--------------------------------------+------------------------------+-------+
| id    | alert_configuration_id | deployment                                            | envelope_index                       | job                          | count |
+-------+------------------------+-------------------------------------------------------+--------------------------------------+------------------------------+-------+
| 14772 |                      1 | cf-aaf6a9e7a####acdd07c                               | f3f1b5d2-####-####-####-f2082abec1f8 | cloud_controller_worker      |     6 |
| 13885 |                      2 | cf-aaf6a9e7a####acdd07c                               | 84403e8c-####-####-####-48756d97b94b | backup-prepare               |     2 |
| 13838 |                      2 | cf-aaf6a9e7a####acdd07c                               | f3f1b5d2-####-####-####-f2082abec1f8 | cloud_controller_worker      |     4 |
......


5. Remove the duplicate entries with following SQL statement.

mysql> delete a from alert a
       inner join (
         select
           max(id) as maxid,
           deployment, 
           envelope_index, 
           job, 
           alert_configuration_id
         from alert
         group by alert_configuration_id, deployment, job, envelope_index
         having count(*)>1) d on d.alert_configuration_id=a.alert_configuration_id and d.deployment=a.deployment and d.job=a.job and d.envelope_index=a.envelope_index
         where a.id < d.maxid;


6. Run the SELECT query again to double confirm that there are no duplicate entries.

mysql> select max(id), alert_configuration_id, deployment, envelope_index, job, count(*) as count from alert  group by alert_configuration_id, deployment, job, envelope_index having count>1;
Empty set (0.00 sec)


7. Quit the MySQL client session

mysql> exit
Bye


8. Kick off errand push-apps again and it should not fail 

9. After verifying that there are no more issues with the Healthwatch tile, connect to the MySQL database as illustrated in step 1. and drop the backup table:

mysql>  drop table alert1;
Query OK, 0 rows affected (0.01 sec)