"Purging broadcast message" error from slump_nxd in stdlog
search cancel

"Purging broadcast message" error from slump_nxd in stdlog

book

Article ID: 45050

calendar_today

Updated On:

Products

CA Service Desk Manager - Unified Self Service CA Service Management - Service Desk Manager

Issue/Introduction

In our Service Desk Advanced Availability environment, we often see the following message in our stdlogs:

slump_nxd #### ERROR server.c #### Purging broadcast message ###### after ### seconds although it is still unacknowledged by the 1 nodes <sdm server name>

What does this message mean and what is its impact?

Environment

Release: 14.1 and above

Component: Service Desk Manager (SDM)

Configuration: SDM Advanced Availability

Cause

The slump processes on Service Desk servers in Advanced Availability mode communicate continually to ensure things like cache synchronisation take place. This is performed by broadcast messages that are delivered to slump_nxd processes on all registered nodes within an Advanced Availability environment.

The following message is displayed in the stdlog when the slump process on one server sends out a broadcast message to other servers but doesn’t get a response from a specific server or servers within a period of time

slump_nxd #### ERROR server.c #### Purging broadcast message ###### after ### seconds although it is still unacknowledged by the 1 nodes <sdm server name>

In the above example message number ###### was sent to <sdm server name>. The broadcasting slump server, then didn’t receive an acknowledgement from <sdm server name> after ### seconds so it purges the message.

The reason for the non-acknowledgement of the broadcast message is likely to be due to the server being unavailable, a network interruption between the servers or the slump port being blocked on the target. This reason for this may need to be investigated further with your network team.

The amount of time the broadcasting slump waits for a response before determining it to be un-acknowledged is defined by the NX.env variable NX_SLUMP_NODE_BROADCAST_MAX_HOLD. The default value for this is 120 seconds. 

For the first possible root cause mentioned, "the server being unavailable", a server could be unavailable because it is extremely busy processing load during the time when the slump messages are being exchanged.  The solution would be to increase the timeout from the default of 120 seconds to a number of seconds greater than the highest number of seconds in any of the "Purging broadcast message" messages..    

Resolution

This value can be increased by setting a value in the NX.env files for all SDM servers in the environment

To do that, run the following commands on the servers: 

pdm_options_mgr -c -s SLUMP_NODE_BROADCAST_MAX_HOLD -v 180 -a pdm_option.inst
pdm_options_mgr -c -s SLUMP_NODE_BROADCAST_MAX_HOLD -v 180 -a pdm_option.inst -t

The result would be the inclusion of the following statement in the NX.env file:

@NX_SLUMP_NODE_BROADCAST_MAX_HOLD=180

Note that, the NX.env file includes a prefix of "NX_" but the command does not include that; this is correct.

The value of the environment variable is in seconds and can be set as high as 1 hour (3600 seconds). However, having to set it too high may indicate a significant period of network outage which should be investigated and prevented.