logs-queue
will report the "too many clients" error in Tanzu Application Service (TAS), formerly Pivotal Cloud Foundry (PCF).logs-queue
app shows the following messages:
func2 ERROR: Failed to bulk save logs into persistent data store : pq: sorry, too many clients already
postgresql.log
log file:
ERROR: could not extend file "base/16385/39278931.3": No space left on device
ERROR: new row for relation "app_log_day0_hour23" violates check constraint "app_log_day0_hour23_timestamp_millis_check"
logs-queue
app, and finally to the PostgreSQL instance. In situations where there is extremely high app log ingress, the logs-queue
app can become unstable if the internal database pruning algorithm is not able to sufficiently keep the postgresql
disk usage low. logs-queue
app will check whether pruning is needed once every hour. It will then only prune the database if the disk usage reaches 85%. When there are short periods of high log ingress, it is possible to fill up the database faster than logs-queue
app can prune. postgresql
database may still fill up even after tuning the appropriate pruning parameters. Please review the known issues in this doc.logs-queue
pruning parameters, you need to first understand why there is such a high log volume. This is especially true if you have already gone through the effort of scaling your Loggregator components to meet your ingress demands. Often, there is a small subset of apps that are generating thousands of logs per second. We advise you first identify why the log volume is so high and then determine which steps can be taken to limit the app log ingress activity. cf top
logs-queue
pruning environment parameters to help make the pruning algorithm intervals more aggressive so that it keeps up with the large ingress volume. postgresql
VM in the PCF metrics deployment and connect to the database.app_log_dayX
do not map directly to specific days of the week. The apps use an internal algorithm to determine which table to use based on the current epoch date. This means day1
is not necessarily the first table to be used. The first table could be day5
or 11
.
$> sudo su - $> /var/vcap/packages/postgres-*/bin/psql -p 5524 -U pgadmin metrics
select table_name, pg_size_pretty(pg_relation_size(quote_ident(table_name))) from information_schema.tables where table_schema = 'public' order by 2 DESC; table_name | pg_size_pretty -----------------------+---------------- app_log_day1_hour1 | 96 MB flyway_schema_history | 8192 bytes app_log_day9_hour15 | 8040 kB app_log_day1_hour2 | 76 MB app_log_day7_hour16 | 655 MB app_log_day1_hour0 | 54 MB app_log_day1_hour4 | 308 MB app_log_day12_hour18 | 23 MB
app_log_day1
and make sure that you have app_log_day1_hour%
in the where filter as per the following example:
select sum(h.size) || ' MB' as size from (select table_name, pg_relation_size(quote_ident(table_name)) /1024/1024 as size from information_schema.tables where table_schema = 'public' and table_name like 'app_log_day1_hour%') h; size -------- 888 MB (1 row)
logs-queue
pruning parameters. You can also review the formulas documented here in order to determine whether you need to scale the postgres log store to keep up with amount of log data.MAX_RETENTION_PERCENTAGE=75 PG_GROOM_DISK_SIZE_INTERVAL=15m
push apps errand
enabled.Logs Disk Size Pruning Interval == PG_GROOM_DISK_SIZE_INTERVAL
Logs Max Retention Percentage == MAX_RETENTION_PERCENTAGE
push apps errand
. It is necessary to modify these same settings in the Ops Manager Tile to prevent a tile update via Ops Manager from reverting settings applied via CF CLI.
cf target -o system -s metrics-v1-6 cf set-env logs-queue MAX_RETENTION_PERCENTAGE 75 cf set-env logs-queue PG_GROOM_DISK_SIZE_INTERVAL 15m cf restage logs-queue
Further correction might be required should you find the too many open connections problem continues after you have resolved the app ingress issues above and the following error is still being observed in the logs-queue app logs.
pq: new row for relation "app_log_day1_hour16" violates check constraint "app_log_day1_hour16_timestamp_millis_check"
The above error may continue to happen when the database disk was once full and the logs-queue
managed to prune out the current day logs. If the current day logs are pruned out then we will see this constraint error. To solve this condition we have to DELETE
all of the app log data and recreate it using the following procedure.
metrics-ingestor
app
cf stop metrics-ingestor
logs-queue
app
cf stop logs-queue
/var/vcap/packages/postgres-*/bin/psql -p 5524 -U pgadmin metrics
app_log
table and call a procedure that will recreate the app log table metadata
truncate app_log; SELECT create_all_app_log_days(current_date);
logs-queue
app
cf start logs-queue
metrics-ingestor
app
cf start metrics-ingestor