Recently upgraded policy server from 12.8.06 and 12.8.08, and initial validations came up fine.
A few weeks later, suddenly unable to bring up the policy server.
smps.log
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.760][CServer.cpp:4567][INFO][sm-Server-01850] Initialized smpolicysrv
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.760][CServer.cpp:7035][INFO][sm-Server-03480] Initializing TLI
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.760][CServer.cpp:8741][INFO][sm-Server-02410] Initializing UDP
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.760][CServer.cpp:7098][INFO][sm-Server-03490] Starting TLI
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.760][CServer.cpp:7112][INFO][sm-Server-03500] Admin UDP port is up
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.760][CServer.cpp:8961][INFO][sm-Server-02420] TCP is up on 3 interfaces listening for incoming connections
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.761][CServer.cpp:8691][ERROR][sm-Server-01640] Failed to initialize server management command channel
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.761][CServer.cpp:7926][ERROR][sm-Server-01530] Failed to initialize server management channel
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.761][CServer.cpp:4233][INFO][sm-Server-04230] The suspend timeout is 3600 seconds.
[3437/139813979408192][Tue mm dd yyyy hh:mm:ss.761][CServer.cpp:4665][INFO][sm-Server-01880] smpolicysrv shutting down
Policy server: 12.8.08
OS Platform: Linux
The error is not specific to 12.8.08 version.
"server management channel" requires Linux pipe (used for interprocess communication) to be created first, and the pipe creation depends on file descriptors on the Linux system itself.
When policy server is running, these files are created under /tmp
-rw-r--r-- 1 root root 0 mm dd 15:38 GCL-SiteMinder.sem
prw------- 1 root root 0 mm dd 15:38 'snrrpni'$'\200''{{'$'\200''pip'
prw-r----- 1 root root 0 mm dd 15:38 GCL-SiteMinder-B.pipe
prw-r----- 1 root root 0 mm dd 15:38 GCL-SiteMinder-A.pipe
When the error happens, one of the files ('snrrpni'$'\200''{{'$'\200''pip') is missing under /tmp.
1. Stop policy server, cleaning up /tmp, try to start policy server again.
2. Try to increase ulimit for "open file" from 1024 to 2048 for siteminder user, then try to start again.
3. export LAX_DEBUG=true, then run "strace" to debug the start up failure. e.g. "strace <installDir>/siteminder/start-all"
4. Reboot the Linux machine, try to start policy server again.
In the end, it was confirmed that rebooting OS did resolve the issue.
https://community.broadcom.com/communities/community-home/digestviewer/viewthread?MID=795837