In this use case, the discovery_server shows a high CPU consumption, and at the same time the probeDiscovery queue is growing and not being cleared of messages. In the hub logs you can see the hub disconnecting the discovery_server from the probeDiscovery queue:
hub: Subscriber 'discovery_server' at 'IP/local port' attached to queue 'probeDiscovery' (subject:probe_discovery requested bulk:0, granted bulk:1, minimum bulk:0, wait:0, heartbeat: 2, reply timeout: 60), time used: 0 ms
hub: Reply not received on queue route for 'probeDiscovery' (timeout), disconnecting
- UIM 8.5.1 but could happen in higher versions
There is a discovery graph in the discovery queue so large that it is taking very long to compute the checksum of the graph. The discovery_server computes the checksum to determine if it is the same graph as before. Computing the checksum is CPU intensive and the time to compute the checksum, in this case, is longer than the hub’s postroute_reply_timeout.
If the discovery_server probe does not reply before the timeout, the hub drops the queue subscription and then will resend the current message when the discovery_server resubscribes, looping the issue as a result.
From IM, open Hub Raw Configure, under the hub section, increase the hub postroute_reply_timeout, for example, set it to 300.
Also, under the hub->postroute section for probeDiscovery add the reply_timeout key if it doesn't exist, and set it to 0.
This should allow the discovery server to process the graph and continue to process the rest of the queued messages.