MOM or CDV MOM is unable to connect to collectors. Reviewing logs, I see an initial connection, and then it fails with an error such as
[WARN] [Collector <host A>@5019] [Manager.Cluster] Failed to connect to the Introscope Enterprise Manager at <host A>@5019 (6) because: com.wily.introscope.spec.server.beans.clocksync.ClockSyncException: Collector clock is skewed from MOM clock by 6,748 ms. The maximum allowed skew is 3,000 ms. Please change the system clock on the collector EM.
This problem is caused by the clocks on the MOM/CDV and Collectors being out of sync. Usually clocks are synchronized with a service like NTP, which if not operating or configured can cause cause the local system clock to drift.
To determine where the problem lies, it is important to review the IntroscopeEnterpriseManager.log file in the <EM_HOME>/logs folder of the MOM/CDV and look at the warnings. If they only mention a single collector, then the problem is most likely on that single collector. If they mention multiple collectors, then the most likely cause is a problem on the MOM.
The network time service for the problematic server needs to be checked to verify that it is running, and that it is configured properly. A quick way to check on a Linux system is to run "ntpq -p" - it should run successfully, and return a table such as this:
ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+111.11.11.11 10.10.10.10 2 u 135 1024 377 0.000 0.000 0.000
If the returned delay/offset/jitter are 0 values (as above), then it indicates that there is a problem as these should always return a non-0 value. You or the system administrator will need to verify that the NTP service is properly configured and working.
To aggregate metrics, a CDV or MOM requires all other members of the cluster to have their clock within 3000 ms (3 seconds) of the MOM/CDV server clock. If the clock of the collector is not within 3 seconds of the MOM/CDV, it will be disconnected. The disconnection of the collector can cause reporting a StreamCorruptedException in the collector log file.
This 3000 ms. requirement is not a configurable property, and cannot be changed.