ENV Details: MySql 5-7-19, 5-7-21 Gateway: 9.3 GMU : 1.5 running on AWS
The customer is facing the following issue: they've set up an API Gateway in Container Form Factor in AWS running on a Kubernetes Cluster connected to an AWS MySQL RDS DB.
The API Gateway can connect to the Database and it is also possible to create Services and Policies using the Policy Manager. When trying to import (migrateIn) a GMU Export Bundle the import fails with the error: "Unexpected end of file from server"
In Gateway Logs we can see some MySQL Errors like:
WARNING: SQL Error: 1213, SQLState: 40001 Jul 18, 2018 1:50:44 PM org.hibernate.util.JDBCExceptionReporter logExceptions SEVERE: Deadlock found when trying to get lock; try restarting transaction and WARNING: SQL Error: 1205, SQLState: 41000 Jul 18, 2018 1:51:35 PM org.hibernate.util.JDBCExceptionReporter logExceptions SEVERE: Lock wait timeout exceeded; try restarting transaction
In the MySQL DB we can see Deadlocks, which were not present befor the GMU Import:
mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 2 row lock(s) MySQL thread id 194, OS thread handle 47778553521920, query id 158004 11.220.73.50 root update insert into folder (name, version, security_zone_goid, parent_folder_goid, goid) values ('mbbmock', 0, null, x'0000000000000000FFFFFFFFFFFFEC76', x'0BAD49997DE5BC52C76F99DFDE3E680A') Trx read view will not see trx with id >= 63347, sees < 63199 ------- TRX HAS BEEN WAITING 44 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 296 page no 7 n bits 304 index PRIMARY of table `ssgclusterb`.`folder` trx id 63347 lock mode S waiting Record lock, heap no 27 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 16; hex 0bad49997de5bc52c76f99dfde3e680a; asc I } R o >h ;; 1: len 6; hex 00000000f6df; asc ;; 2: len 7; hex d0000001d40110; asc ;; 3: len 4; hex 80000000; asc ;; 4: len 7; hex 6d62626d6f636b; asc mbbmock;; 5: len 16; hex 0000000000000000ffffffffffffec76; asc v;; 6: SQL NULL;
------------------------------------------------------------------- ---TRANSACTION 63199, ACTIVE 165 sec 1091 lock struct(s), heap size 123088, 42631 row lock(s), undo log entries 79542 MySQL thread id 188, OS thread handle 47778587682560, query id 175152 11.220.73.50 root Trx read view will not see trx with id >= 63199, sees < 63199 -------------------------------------------------------------------
Have seen this kind of error before? Do you know how to solve it?
MySQL Database Engine tested: 5.7.17, 5.7.19, 5.7.21 DB Instance Class: db.t2.medium
Custom MySQL Parameters set: log_bin_trust_function_creators = 1 max_allowed_packet=16777216
For example, if you create a TEST folder and a service on a source system, you can then successfully import the folder and its contents on the target.
However, we also tried just importing the global policies and fails.with
WARNING: SQL Error: 1452, SQLState: 23000 Jul 19, 2018 12:56:03 PM org.hibernate.util.JDBCExceptionReporter logExceptions SEVERE: Cannot add or update a child row: a foreign key constraint fails (`ssgclusterb`.`policy`, CONSTRAINT `policy_folder` FOREIGN KEY (`folder_goid`) REFERENCES `folder` (`goid`) ON DELETE NO ACTION ON UPDATE NO ACTION) Jul 19, 2018 12:56:03 PM org.hibernate.util.JDBCExceptionReporter logExceptions WARNING: SQL Error: 1452, SQLState: 23000 Jul 19, 2018 12:56:03 PM org.hibernate.util.JDBCExceptionReporter logExceptions SEVERE: Cannot add or update a child row: a for
Environment
ENV Details: MySql 5-7-19, 5-7-21 Gateway: 9.3 GMU : 1.5 running on AWS
Resolution
Having started to look deeper into the components in between the Internet and the Gateway Container. I found that the issue was caused by the Idle Timeout setting of the Amazon ELB. The timeout was set by default to 60s.
Since there's no communication between the Gateway and the Client until the migration is finished, the ELB closed the connection.
I assume that GMU was retrying the request after the ELB closed the connection the first time. The Gatway was still working on the first request, when receiving the 2nd request from GMU. The second request then causes the MySQL Deadlocks.
I increased the ELB Idle timeout to 1800s and now it's working fine.