Users are unable to log on to the database, unless they are using a local host. When using external clients, all users, superuser or non-superuser, are unable to log on.
When users try to log on with external clients, it hangs for some time and errors out with the following error messages in the master logs:
2016-08-06 12:51:39.878278 EDT,"pidsolardb1p","postgres",p40428,th-1642248416,"172.16.46.236","61277",2016-08-06 12:50:39 EDT,64569559,con455,,seg-1,,,x64569559,sx1,"LOG","00000","Issuing cancel signal (SIGINT) to my self (pid = 40428) for statement timeout.",,,,,,,0,,"proc.c",1551, 2016-08-06 12:51:39.878471 EDT,"pidsolardb1p","postgres",p40428,th-1642248416,"172.16.46.236","61277",2016-08-06 12:50:39 EDT,64569559,con455,,seg-1,,,x64569559,sx1,"LOG","00000","Process interrupt for 'query cancel pending'.",,,,,,,0,,"postgres.c",3523, 2016-08-06 12:51:39.878509 EDT,"pidsolardb1p","postgres",p40428,th-1642248416,"172.16.46.236","61277",2016-08-06 12:50:39 EDT,64569559,con455,,seg-1,,,x64569559,sx1,"FATAL","57014","canceling statement due to statement timeout",,,,,,,0,,"postgres.c",3544,
All logon's from outside clients are logged, regardless if it is a superuser or non-superuser logging on. There are no changes to the customer configuration.
Follow the following steps to troubleshoot the issue:
[gpadmin@mdw pg_log]$ gpconfig -s log_min_messages Values on all segments are consistent GUC: log_min_messages Master value: DEBUG1 Segment value: DEBUG12. The master log logged in more details as that master tried to acquire a session, and it hit the timeout:
2016-08-06 13:47:17.264294 EDT,"test_sup","postgres",p53213,th-1642248416,"172.16.138.36","59400",2016-08-06 13:47:17 EDT,0,,,seg-1,,,,,"DEBUG1","00000","InitProcess(): gp_session_id -1, gp_is_callback 0",,,,,,,0,,"proc.c",407, 2016-08-06 13:47:17.297347 EDT,"test_sup","postgres",p53213,th-1642248416,"172.16.138.36","59400",2016-08-06 13:47:17 EDT,0,con1470,,seg-1,,,,,"DEBUG1","00000","SessionState_Acquire: pinCount: 1, activeProcessCount: 0",,,,,,,0,,"session_state.c",122,"Stack trace: 2016-08-06 13:48:17.298859 EDT,"test_sup","postgres",p53213,th-1642248416,"172.16.138.36","59400",2016-08-06 13:47:17 EDT,64571278,con1470,,seg-1,,,x64571278,sx1,"LOG","00000","Issuing cancel signal (SIGINT) to my self (pid = 53213) for statement timeout.",,,,,,,0,,"proc.c",1551,2. pstack on the login pid, confirm that the connection is waiting on ldap:
[root@mdw ~]# pstack 53213 #00x000000364c80eaa0 in __connect_nocancel () from /lib64/libpthread.so.0 #10x00007f05a0de099e in ldap_pvt_connect () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #20x00007f05a0de0e89 in ldap_connect_to_host () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #30x00007f05a0dc8f41 in ldap_int_open_connection () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #40x00007f05a0dddd83 in ldap_new_connection () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #50x00007f05a0dc8929 in ldap_open_defconn () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #60x00007f05a0ddd565 in ldap_send_initial_request () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #70x00007f05a0dd0b2d in ldap_sasl_bind () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #80x00007f05a0dd0c34 in ldap_sasl_bind_s () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #90x00007f05a0dd15d5 in ldap_simple_bind_s () from /usr/local/greenplum-db/./lib/libldap-2.3.so.0 #10 0x000000000079b092 in CheckLDAPAuth () #11 0x000000000079bf90 in ClientAuthentication () #12 0x0000000000aef2f0 in InitPostgres () #13 0x0000000000969511 in PostgresMain () #14 0x00000000008c731e in ServerLoop () #15 0x00000000008ca0c0 in PostmasterMain () #16 0x00000000007c870f in main ()Use the strace command, confirm that that master is waiting on a response from an IP and hit the timeout. Observe that the IP pointed to customers ldap server:
[root@mdw ~]# strace -p 53213 Process 53213 attached - interrupt to quit connect(5, {sa_family=AF_INET, sin_port=htons(389), sin_addr=inet_addr("172.16.94.52")}, 16) = ? ERESTARTSYS (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- select(3, NULL, [2], NULL, {1, 0})= 1 (out [2], left {0, 999996}) write(2, "\0\0\0\0\304\0\0\0\335\317\0\0 G\35\236 G\35\236\0\0\0\0tcf\0\0\2\0\0"..., 244) = 244
3. As requested, check their LDAP server. Reload the LDAP server and confirm that connections completed successfully.