Symptoms:
Upgrading Healthwatch from version 1.3.x to 1.4.x might fail with the following error for the errand “push-apps”.
<Duplicate entry '#####' for key 'alert_configuration_id'>
This error is shown in the example below:
[1m2019-02-12 07:23:10[m [1;31mERROR[m [2mcommand.DbMigrate[m [elastic-3] Migration of schema `platform_monitoring` to version 1401 - v1 4 migrate alert to alert status failed! Please restore backups and roll back database and code!
[1m2019-02-12 07:23:10[m [1;31mERROR[m [2mpushapps.DatabaseMigrator[m [elastic-3] Encountered error: org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException:
Migration V1401__v1_4_migrate_alert_to_alert_status.sql failed
--------------------------------------------------------------
SQL State : 23000
Error Code : 1062
Message : (conn=2####4) Duplicate entry '71-cf-healthwatch-forwarder-60db4532-###-####-####-c29e3a065e9b' for key 'alert_configuration_id'
Location : /var/vcap/jobs/push-apps/packages/healthwatch-data/dbmigrations/V1401__v1_4_migrate_alert_to_alert_status.sql (/var/vcap/jobs/push-apps/packages/healthwatch-data/dbmigrations/V1401__v1_4_migrate_alert_to_alert_status.sql)
Line : 3
Statement : insert into alert_status(alert_configuration_id, deployment, job, envelope_index, status, transitioned_at_seconds)
select a1.alert_configuration_id,
a1.deployment,
a1.job,
a1.envelope_index,
a1.status,
a1.transitioned_at_seconds
from alert a1
JOIN(
SELECT
deployment,
envelope_index,
job,
alert_configuration_id,
MAX(transitioned_at_seconds) AS transitioned_at_seconds
FROM alert
GROUP BY alert_configuration_id, deployment, job, envelope_index
) as a2
on a1.alert_configuration_id = a2.alert_configuration_id
and a1.deployment = a2.deployment
and a1.job = a2.job
and a1.envelope_index = a2.envelope_index
and a1.transitioned_at_seconds = a2.transitioned_at_seconds
with message
Migration V1401__v1_4_migrate_alert_to_alert_status.sql failed
alert_status". Data for this new table will be populated from the existing table called "alert" with a unique key formed by the following columns: alert_configuration_id, deployment, job, envelope_index.select a1.alert_configuration_id,
a1.deployment,
a1.job,
a1.envelope_index,
a1.status,
a1.transitioned_at_seconds
from alert a1
JOIN(
SELECT
deployment,
envelope_index,
job,
alert_configuration_id,
MAX(transitioned_at_seconds) AS transitioned_at_seconds
FROM alert
GROUP BY alert_configuration_id, deployment, job, envelope_index
) as a2
on a1.alert_configuration_id = a2.alert_configuration_id
and a1.deployment = a2.deployment
and a1.job = a2.job
and a1.envelope_index = a2.envelope_index
and a1.transitioned_at_seconds = a2.transitioned_at_seconds
alert_configuration_id, deployment, job, envelope_index, this error will occur.alert_status" is a new table in Healthwatch v1.4, it is empty the first time the Healthwatch tile is upgraded from v1.3 to v1.4. If errand “push-apps” is run twice, it will fail with the error for the second run.If losing the Healthwatch data history is not a concern, then the best solution is to remove the existing Healthwatch tile and re-install it again with the new 1.4 release.
However, if there is a need to keep the data history, then follow the steps below to check for duplicate entries in the Healthwatch database. Remove some to make them (alert_configuration_id, deployment, job, envelope_index) unique in the result of the above query.
Note: This will cause some data loss on the "alert" table, but most of them will be kept.
1. Check out the database instance of Healthwatch deployment, ssh into the instance, and change to root user:
$ bosh -d p-healthwatch-a#####b vms Using environment '10.###.###.11' as client 'ops_manager' Task 17167. Done Deployment 'p-healthwatch-a0619872######c0723ab' Instance Process State AZ IPs VM CID VM Type Active healthwatch-forwarder/27bad#####e2a9 running az1 10.###.###.79 vm-57f6a03c-###-###-####-e2fff7e0c42d xlarge true healthwatch-forwarder/d4875ec0-6696-414e-8ba1-77481ff7c1ef running az1 10.###.###.78 vm-5629d079-####-###-####-a4a467a95bc4 xlarge true mysql/d152cb41-07f1-4b09-93aa-3053d821082b running az1 10.###.###.81 vm-df2f34c1-####-####-####-d2d42c573c94 2xlarge true redis/d10b7b52-###-####-9f95-24e2d1f1ef0f running az1 10.###.###.80 vm-92746af4-####-####-####-f2f063922918 xlarge true 4 vms $ bosh -d p-healthwatch-a06#####3ab ssh mysql/0 $ sudo -i mysql/d152cb41-####-####-###-3053d821082b:~#
2. Connect to the MySQL database and switch to database "platform_monitoring"
# mysql --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 40093 Server version: 5.7.22-22-29.26-log MySQL Community Server (GPL), wsrep_29.26 Copyright (c) 2009-2018 Percona LLC and/or its affiliates Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use platform_monitoring; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql>
ERROR 2002 (HY000): Can't connect to local MySQL server through socket ......", try the command "mysql --defaults-file=/var/vcap/jobs/mysql/config/mylogin.cnf"
3. Make a backup of table "alert"
mysql> create table alert1 as select * from alert;
4. Run the following query to retrieve duplicate records
mysql> select max(id) as id, alert_configuration_id, deployment, envelope_index, job, count(*) as count from alert group by alert_configuration_id, deployment, job, envelope_index having count>1;
For example, duplicated entries are returned as shown below:
mysql> select max(id) as id, alert_configuration_id, deployment, envelope_index, job, count(*) as count from alert group by alert_configuration_id, deployment, job, envelope_index having count>1; +-------+------------------------+-------------------------------------------------------+--------------------------------------+------------------------------+-------+ | id | alert_configuration_id | deployment | envelope_index | job | count | +-------+------------------------+-------------------------------------------------------+--------------------------------------+------------------------------+-------+ | 14772 | 1 | cf-aaf6a9e7a####acdd07c | f3f1b5d2-####-####-####-f2082abec1f8 | cloud_controller_worker | 6 | | 13885 | 2 | cf-aaf6a9e7a####acdd07c | 84403e8c-####-####-####-48756d97b94b | backup-prepare | 2 | | 13838 | 2 | cf-aaf6a9e7a####acdd07c | f3f1b5d2-####-####-####-f2082abec1f8 | cloud_controller_worker | 4 | ......
5. Remove the duplicate entries with following SQL statement.
mysql> delete a from alert a inner join ( select max(id) as maxid, deployment, envelope_index, job, alert_configuration_id from alert group by alert_configuration_id, deployment, job, envelope_index having count(*)>1) d on d.alert_configuration_id=a.alert_configuration_id and d.deployment=a.deployment and d.job=a.job and d.envelope_index=a.envelope_index where a.id < d.maxid;
6. Run the SELECT query again to double confirm that there are no duplicate entries.
mysql> select max(id), alert_configuration_id, deployment, envelope_index, job, count(*) as count from alert group by alert_configuration_id, deployment, job, envelope_index having count>1; Empty set (0.00 sec)
7. Quit the MySQL client session
mysql> exit Bye
8. Kick off errand push-apps again and it should not fail
9. After verifying that there are no more issues with the Healthwatch tile, connect to the MySQL database as illustrated in step 1. and drop the backup table:
mysql> drop table alert1; Query OK, 0 rows affected (0.01 sec)