Description:
Customers using a distributed eHealth cluster may open cases in which reports run from a reporting front end (RFE) time out or contain errors that not all data could be retrieved, but the same reports run perfectly well when run directly off the back-end poller (BE).
In most every case all that is required is adjusting the distributed environment variables to higher setting to prevent timeouts.
Solution:
To add or change these variables on a Windows server, customers should open the System control panel, go to the Advanced tab, and click the Environment Variables button. Use the New or Edit buttons in the lower section of the window (for System variables) to make changes. The server must be rebooted for changes to take effect.
To add or change variables on a Solaris or Linux server, customers should edit the $NH_HOME/nethealthrc.sh.usr file. If the variable is not already defined in the file, add it using the following format:
VARIABLE=value; export VARIABLE
Do not add spaces between the variable name and its value. Changes do not take effect until you stop and restart the eHealth services.
nhServer stop
nhServer start
Add or edit the following variables on the RFEs only:
Variable Name | Default Value | Recommended Change |
NH_DRPT_HEARTBEAT_INTVL | 60 | 120 |
NH_DRPT_MISSED_MSG_LIMIT | 2 | 3 |
NH_DRPT_RUN_CMD_TIMEOUT1 | 45 | 90 |
NH_DRPT_STATUS_MSG_TIMEOUT2 | 30 | 60 |
Add or edit the following variables on all cluster members (RFEs and BEs):
Variable Name | Default Value | Recommended Change |
NH_CLUSTER_CMD_TIMEOUT | 30 | 300 |
NH_RCS_CONNECT_TIMEOUT | 5 | 300 |
NH_RCS_MSG_TIMEOUT3 | 5 | 20 |
NH_RCS_RETRY_QUEUE_TIME3 | 5 | 5 |
1 The value of NH_DRPT_RUN_CMD_TIMEOUT should be less than NH_DRPT_HEARTBEAT_INTVL and more than NH_DRPT_STATUS_MSG_TIMEOUT
2 The value of NH_DRPT_STATUS_MSG_TIMEOUT should be less than NH_DRPT_HEARTBEAT_INTVL
3 The value of these two variables, when added together and multiplied by two, should be less than the values of all the other variables listed except NH_DRPT_MISSED_MSG_LIMIT. In the recommended changes, for example, (20+5)*2 = 50. All other listed variables should be no lower than 51.
The variables may still need to be adjusted upwards for large clusters where reports will include many elements.
For full details of what each variable is used for, their default values, and their maximum values, please review the eHealth Command and Environment Variables Reference Guide.
ADDENDUM: NH_DRPT_COMPRESS deprecated in 6.3.0, replaced with parameter reporting.distributed.enableRdiCompression
When dealing with eHealth distributed clusters, some customer have experienced report timeouts and/or failures when the reporting front end console (RFE) cannot collect data from one or more back end pollers (BEs) fast enough. The solution to this problem is to adjust the environment variables related to distributed reports so that more time is allowed before erroring out. Specifically, the key variable has been NH_DRPT_COMPRESS_RDI, which needs to be set to YES on every member of the cluster (both RFEs and BEs).
While the NH_DRPT_COMPRESS_RDI variable is still completely functional and will work for anyone using it now, there is a newer, preferred method for averting potential report timeouts.
The new preferred method to enable RDI compression is to use the following command:
nhParameter -set reporting.distributed.enableRdiCompression "yes"
This has a number of advantages over the environment variable.
The new method also avoids the issue where you bring a new machine into the cluster but forget to set the environment variable on the new machine, which will cause distributed reports to fail.