API Developer Portal 4.X doesn't start up.
- Many containers are failing to connect to the postgres DB container.
- postgres container received SIGKILL and returned status 137
- "sed" command fails in the postgres container (i.e. "sed: write error")
- df command shows that the /var volume is completely full
- /var/lib/docker/volumes/portal_database-postgres-volume/_data/pg_hba.conf shows that the file has grown considerably in size (should be only ~4K)
- There are a lot of large pg_hba.confxxx swap files within the /var/lib/docker/volumes/portal_database-postgres-volume/_data directory
NOTE: this can also happened for portal 4.4 even if you have enough space on HDD, script is hard coded with limit. so this article can help
Release : 4.X
Component : API PORTAL
pg_hba.conf(*1) file contains 268435551 lines (2^28+95)
*1 The target file of the failed "sed" command
(1) At the line 15 of entrypoint.sh in the postgreSQL container, the "sed" command (*2) is executed.
(*2) substitutes a particular string with another string containing a line break.
The number of lines in pg_hba.conf is doubled every time the entrypoint.sh is executed.
It seems (*2) caused the failure of starting up the API Developer Portal.
*2 sed -i 's/local\s*all.*/local\tall\t\tpostgres\t\t\t\tpeer\nlocal\tall\t\tall\t\t\t\t\tmd5/' ${PGDATA}/pg_hba.conf
The same settings in the following files are duplicated every time the API Portal is restarted.
/var/lib/docker/volumes/portal_database-postgres-volume/_data/pg_hba.conf
/var/lib/docker/volumes/portal_database-postgres-slave-volume/_data/pg_hba.conf
After a lot of restarts, these files cause a "disk full" condition and the API Portal fails to start up.
The duplicated lines are shown as below:
local all postgres peer
local all all md5
As a workaround, please remove excessive pairs of these two lines from the pg_hba.conf files regularly for preventing disk full.
Clean up process:
1. Backup pg_hba.conf in case of typos during cleanup, give it a name like pg_hba.orig (mv pg_hba.conf pg_hba.orig), otherwise you might delete the backup if you do an rm pg_hba.conf*
2. Remove the pg_hba.confxxx files, the ones with a random alphanumeric string appended to the filename
rm -f pg_hba.conf*
3. Remove duplicate entries caused by the bad sed statement in the pg_hba.conf file
sudo sed -i '/local\s*all.*/ d' <path to>/pg_hba.conf
4. Insert two lines required to pg_hba.conf
sudo echo -e "local\tall\t\tpostgres\t\t\t\tpeer\nlocal\tall\t\tall\t\t\t\t\tmd5" | sudo tee -a <path to>/pg_hba.conf
5. cat pg_hba.conf to verify that it ends like this (as of 4.4):
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local replication admin trust
#host replication admin 127.0.0.1/32 trust
#host replication admin ::1/128 trust
host all all all md5
host replication all 0.0.0.0/0 trust
local all postgres peer
local all all md5
You may need to repeat the above for the postgres slave, which will be in the /var/lib/docker/volumes/portal_database-postgres-slave-volume/_data directory.
The PostgreSQL containers are provided for testing purpose. This problem doesn't occur with external MySQL database for production.
Fixed in API Portal 4.5