In pg_auto_failover, the user can enhance the network security by enabling the SSL (via command: # pg_autoctl enable ssl xxxx), refer to this document
In some cases, After enabling the SSL, the user might notice all data nodes been marked as unhealthy and in down state.
In this article, we will discuss one of the reasons that may cause such an issue - caused by invalid permission of client certificate/key on monitor.
Product Version: 14.5
When the data node has been marked as unhealthy, please check the Postgres logs of the data node, below is an example:
From the logs, we can see:
To check why the SSL connection failed, we can use psql client to connect to the Data Node from the Monitor Node.
- Noted that the client certificate and key by default is under ~/.postgresql/
- Run the below command from the Monitor Node:
psql -h <DataNode> -U pgautofailover_monitor "dbname=postgres sslmode=verify-full sslcert=<CLIENT CERT FILE> sslkey=<CLIENT KEY FILE> sslrootcert=<ROOT CERT>"
In this example, we get the below error:
- So it is clear that the reason why the monitor can not connect to a data node is due to invalid permission of the client's certificate files under ~/.postgresql/, once we fix the permission issue, the cluster is back to normal