The purpose of this article is to explain the details of how function HA works.
Highly-available functions (i.e. those that return Function.isHA as true) are required to be idempotent because they will be retried on all the nodes after a node failure. This is due to the fact that buckets can be rebalanced and the re-execution may need to run on a different set of servers than before, so the system cannot simply exclude just the members on which function was successful. Even if the set of servers is same, the routing object to server mapping can be different due to primary bucket rebalancing.
Moreover, the execution cannot differentiate between a set of changes from one server and those from another in the custom ResultCollector. For this reason ResultCollector.clearResults must be implemented correctly to clear all the results. In addition, for functions that have some side-affects (e.g. updates), it is always possible that the failure occured mid-function on a server. Hence, that again means partial retry is not possible, since the re-execution will have no knowledge of how much went through on the previous try on any failed servers.
For a function that is not idempotent, some possible options are:
The FunctionContext#isPossibleDuplicate() method is used to identify whether this is a re-execution and needs to be used in conjunction with Function#isHA() as true. It specifies whether the function is eligible for re-execution.
When a failure occurs (such as an execution error or member crash while executing), the system responds as follows:
GemFire 7 and 8