Customer installed Tanzu Platform, including Ops Manager 3.0 and Tanzu Application Service 6.0.10. The installation was on the AWS IaaS, and they were using RDS as their external database service. When the deploy-autoscaler errand ran, the first VM to be deployed failed to start.
Retrieving logs for app autoscale-new in org system / space autoscaling as admin...
...
2024-12-19T18:36:00.59+0000 [APP/PROC/WEB/0] OUT app instance exceeded log rate limit (16384 bytes/sec)
...
2024-12-19T18:36:06.70+0000 [API/1] OUT App instance exited with guid XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX payload: {"instance"=>"XXXXXXXX-XXXX-XXXX-XXXX-XXXX", "index"=>0, "cell_id"=>"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 1", "crash_count"=>1, "crash_timestamp"=>1734633366648827893, "version"=>"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"}
...
2024-12-19T18:36:22.36+0000 [PROXY/0] OUT Exit status 137
Logs were truncated for autoscaler since the log rate exceeded the default limit of 16384 bytes/sec. Because some log lines were discarded, we may have missed important details that could lead us to the root cause. In order to see the full error output, we need set log rate limit to unlimited.
If you have cf CLI version 8.5+, you can set the rate to unlimited with this command:
cf scale autoscaler -l -1
If you have a lower version of cf CLI, you can set the log rate limit use a cf curl against the API:
cf curl /v3/processes/$(cf curl /v3/apps/$(cf app autoscale --guid)/processes | jq -r '.resources[] | select(.type=="web") | .guid')/actions/scale -X POST -H "Content-type: application/json" -d '{"log_rate_limit_in_bytes_per_second": -1}
Running errand now produces this error:
2024-12-23T20:14:27.57+0000 [APP/PROC/WEB/0] OUT time="2024-12-23T20:14:27Z" level=fatal msg="Failed to initialize repository handlerError 1067 (42000): Invalid default value for 'last_executed_at' handling 48_add_last_executed_at_to_slc_table.sql"
A migration for the autoscaling database had failed in the customer's Amazon RDS databse.
Fix:
ALTER TABLE `scheduled_limit_changes` ADD last_executed_at TIMESTAMP DEFAULT 0 NOT NULL;
Upon further investigation, we concluded that the error was caused by a value in the sql_mode parameter.
When STRICT_ALL_TABLES is set and NO_ZERO_IN_DATE or NO_ZERO_DATE is set, we see the error that the customer observed. We could remove both of these options ( NO_ZERO_DATE & NO_ZERO_IN_DATE ) or we can could just remove STRICT_ALL_TABLES. Given our default lab setup does not include STRICT_ALL_TABLES, we opted to remove that. STRICT_ALL_TABLES is included when customers set SQL_MODE to "TRADITIONAL" in the RDS console. TRADITIONAL is a set of options.
To change the sql_mode value, we used this command:
set global sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';
... in order to remove the STRICT_ALL_TABLES parameter.
This change (implemented in the Amazon RDS dashboard) allowed the autoscaler app to start up successfully.
We found a similar scenario described in this article:
https://stackoverflow.com/questions/9192027/invalid-default-value-for-create-date-timestamp-field