In our Service Desk Advanced Availability environment, we often see the following message in our stdlogs:
slump_nxd #### ERROR server.c #### Purging broadcast message ###### after ### seconds although it is still unacknowledged by the 1 nodes <sdm server name>
What does this message mean and what is its impact?
Release: 14.1 and above
Component: Service Desk Manager (SDM)
Configuration: SDM Advanced Availability
The slump processes on Service Desk servers in Advanced Availability mode communicate continually to ensure things like cache synchronisation take place. This is performed by broadcast messages that are delivered to slump_nxd processes on all registered nodes within an Advanced Availability environment.
The following message is displayed in the stdlog when the slump process on one server sends out a broadcast message to other servers but doesn’t get a response from a specific server or servers within a period of time
slump_nxd #### ERROR server.c #### Purging broadcast message ###### after ### seconds although it is still unacknowledged by the 1 nodes <sdm server name>
In the above example message number ###### was sent to <sdm server name>. The broadcasting slump server, then didn’t receive an acknowledgement from <sdm server name> after ### seconds so it purges the message.
The reason for the non-acknowledgement of the broadcast message is likely to be due to the server being unavailable, a network interruption between the servers or the slump port being blocked on the target. This reason for this may need to be investigated further with your network team.
The amount of time the broadcasting slump waits for a response before determining it to be un-acknowledged is defined by the NX.env variable NX_SLUMP_NODE_BROADCAST_MAX_HOLD. The default value for this is 120 seconds.
For the first possible root cause mentioned, "the server being unavailable", a server could be unavailable because it is extremely busy processing load during the time when the slump messages are being exchanged. The solution would be to increase the timeout from the default of 120 seconds to a number of seconds greater than the highest number of seconds in any of the "Purging broadcast message" messages..
This value can be increased by setting a value in the NX.env files for all SDM servers in the environment
To do that, run the following commands on the servers:
pdm_options_mgr -c -s SLUMP_NODE_BROADCAST_MAX_HOLD -v 180 -a pdm_option.inst
pdm_options_mgr -c -s SLUMP_NODE_BROADCAST_MAX_HOLD -v 180 -a pdm_option.inst -t
The result would be the inclusion of the following statement in the NX.env file:
@NX_SLUMP_NODE_BROADCAST_MAX_HOLD=180
Note that, the NX.env file includes a prefix of "NX_" but the command does not include that; this is correct.
The value of the environment variable is in seconds and can be set as high as 1 hour (3600 seconds). However, having to set it too high may indicate a significant period of network outage which should be investigated and prevented.