Context of the issue
In highly concurrent web application environments, avoiding any and all race conditions is
impossible. Thus, the goal cannot be to prevent them but to minimize them. This implies that users still need to take "
steps" to protect against such situations.
Why is preventing race conditions impossible?
1. Imagine a scenario where user / node A loads Session X and user / node B also loads Session X. Each has a snapshot of Session X's state at the time
T1. Now, at
T2, the user / node A modifies Session X attribute M. Meanwhile, user / node B completely removes Session X attribute M at
T3. Both users / nodes go on to save (i.e. persist) the Session X at some
Tn. Should Session X contain attribute M or not?
2. Imagine a slightly different scenario where user / node A and user / node B both load Session X but add different Attributes, M & N, where Session X contained neither Attributes M nor N at the time user / node A and user / node B both loaded Session X (say,
T1). When Session X is saved / persisted by user / node A and user / node B, does it contain Attribute M or N, both, or neither? Of course, we would expect it to contain both Attributes M & N, but that is not necessarily the case since the snapshot of Session X contained neither Attributes when user / node A and user / node B both loaded Session X (
T1).
3. Imagine one last scenario where user / node A and user / node B both load Session X at roughly the same time (
T1) and user / node A adds (a non-existing) Attribute M and user / node B removes the (existing) Attribute N. That is, Attribute M did not exist when both user / node A and user / node B loaded Session X, but Attribute N did.
When Session X is saved by user / node A and user / node B at roughly the same time (
Tn), does Session X contain Attribute M and was Attribute N successfully removed? Of course, the desired state is that Session X contains Attribute M and does not contain Attribute N when viewed from either user / node A or user / node B or even another user / node C. However, this will not necessarily be the case.
It all boils down to 1) "
when" the Session (X) state is saved / persisted and 2) "
how" it is persisted, regardless of what is desired or not. The "
when" factor is what makes this a race condition, and the "
how" factor (combined with the Web application's logic for interacting with the Session in the first place) is what makes this impossible for any backend system (GemFire / Geode, framework or otherwise) to prevent. Essentially since it depends on the order of operations along with timing and what you will find is that essentially the best you can do in any situation is a last-one-wins.
It really does not matter if you are using web container managed sessions, GemFire / Geode HTTP Session Management, Spring Session, or any other session management framework or library - no solution escapes these problems.
The Approach To Handle This Issue
How does Spring Session for Apache Geode / VMware Tanzu GemFire (SSDG) combined with VMware Gemfire, VMware Tanzu Gemfire[VMs], or VMware Tanzu Gemfire[k8s] handle these situations?
That depends in part by "
how" the framework and GemFire / Geode persists as well as the manage Session state. It is also the "
responsibility" of users and their applications to use the Session wisely and appropriately. In most cases, users / customers alike tend to grossly misuse the Session. Additionally, most users / customers want to have their cake and eat it too.
Note: This is no different from any other Spring Session module and backend data store, e.g. Spring Session Data Redis (SSDR) and Redis. However, we will say that SSDG does extend beyond the other Spring Session modules to give users / customers additional capabilities and flexibility that is not afforded to users / customers by other Spring Session modules, such as the choice of Session "
serialization" format which, we will see below, turns out to be quite important.
The first important bit to understand is: SSDG uses the DAO pattern and PDX (by default) to persist Session state! "
Subscriptions" are enabled by default when combined with Spring boot SBDG and all SSDG applications "
register interest" in the Sessions they create or access (i.e. not all Sessions do this by default as other Spring Session modules do) to manage Session state.
1. Why PDX?
This actually turns out to be a mistake in hindsight. But, the serialization mechanism was changed to PDX in SSDG 2.0+ to appease VMware Tanzu Gemfire [VMs]. That was because users did not have the ability to 1) change the
CLASSPATH of the servers in the VMware Tanzu Gemfire [VMs] cluster nor 2) did they have the complete ability to upload their application domain classes (classes stored in the Session). Additionally, PDX was enabled on all servers in the VMware Tanzu Gemfire [VMs] cluster by default and therefore became the logical choice. It simultaneously solved one problem (users got their cake) and created another (users could not eat it and then keep it).
It turns out that GemFire / Geode Data Serialization and Deltas are absolutely necessary to "
minimize" (again, you cannot prevent), race conditions or stale / incorrect Session states, as described in just a few scenarios in the above.
PDX does not cleanly support Deltas, yet Deltas are essential and necessary to "
minimize" race conditions and/or collisions.
2. Why doesn't PDX support Deltas?
In short, because Deltas always involve a method invocation on the target object (value) stored in GemFire / Geode, and the only way to invoke a method on an object is to first deserialize it, which defeats the purpose of PDX in the first place.
Note: Application domain objects must literally implement the Delta interface, where the
fromDelta(:DataInput) method is used to apply the delta and the t
oDelta(:DataOutput) method is used to determine the delta, or changes, that will be sent over the wire.
Additionally, a deserialization will require the application domain classes (along with the SSDG framework classes) to exist on the
CLASSPATH of all servers in the cluster, requiring access to the Session and its contents.
So, as a result, every time the Session is saved / persisted, the entire Session object and all of its contents are sent over the wire when using PDX. This, in effect, increases the likelihood of race conditions along with stale or missing states.
3. So why even use PDX at all to save / persist the Session and it's contents?
Well, to prevent having to place the framework / library and application domain model classes on the
CLASSPATH of the servers in the cluster. Additionally, it affords users the ability to query the Session and its contents without deserializing the Session. Finally, it was a decision to support VMware Tanzu Gemfire [VMs] at the time.
Of course, using PDX is not all bad since 1) you do not need a framework / library nor application domain model classes on the
CLASSPATH of servers in the cluster and 2) you can query the Session and its contents, in serialized form, when/if necessary.
But, users / customers should honor the rules of engagement when accessing / managing Session state.
1. Intent & Purpose: Keep in mind that a Session is meant to maintain a "
conversational" state between the user and the application while the user is interacting and progressing through the application. The Session is not meant to store application state!
2: Safety-first: Sessions will nearly always be accessed in a multi-threaded context given the nature of Web servers in general (i.e. the Thread Per Request model), so the Session and its contents need to be Thread-safe (SSDG actually helps in this regard).
3: Less is More: Users should try to minimize, as much as possible, what is actually stored and maintained in the Session in the first place. Users tend to store too much in the Session.
4. Ideally, only a single user (Thread) and application node would be accessing a "logical" Session at any given time: This means requests and Session access should be coordinated by the workflows (functionality) of the application and possibly infrastructure, if necessary, to minimize required coordination and possible collisions. This will, in effect, determine the design of the application and system architecture to an extent - it is unavoidable. Again use #1 and #3 as much as possible. This is challenging to do in a highly concurrent and multi-application nodes environment (e.g. 200+ Kubernetes containers each running the same Web application; i.e. a GemFire /Geode ClientCache application). Even though Sticky Sessions are helpful, they are considered an anti-pattern in any cloud-based application and it is preferred that application logic coordinate Session activity as it is highly application dependent anyway.
5. Safety-net: You may still need to rely on old techniques (e.g. making use of Deltas, Optimistic Locking or other forms of Conflict Resolution - for example, Merge; this is unavoidable but the impact is also dependent on the application's workflow and design.
6: Rule of Thumb: Last update wins, always!
Again, not even Data Serialization and Deltas can completely prevent race conditions - for example, collisions; they only help to minimize them, which is why #5 is needed and #6 is absolute, no exceptions!
Solution Conclusion
1. Use Data Serialization and Deltas!
If you intend to access the Session from your applications which may be highly concurrent Web application environments, you need to take steps to guard against race conditions and other potential problems, no exceptions! The Spring Session (SSDG) and application domain model classes will need to be on the CLASSPATH of the servers in the GemFire cluster, period.
2. Use the Session as it was intended, minimizing what is stored in the Session.
3. Keep in mind last update wins and you may need to use other coordination techniques (e.g. Optimistic Locking).
Implementation example of the above solution
1. In the Spring Session client application, enable Data Serialization and Delta:
For example:
import org.springframework.session.data.gemfire.config.annotation.web.http.EnableGemFireHttpSession;
import org.springframework.session.data.gemfire.config.annotation.web.http.GemFireHttpSessionConfiguration;
@EnableGemFireHttpSession(clientRegionShortcut = ClientRegionShortcut.PROXY, sessionSerializerBeanName = GemFireHttpSessionConfiguration.SESSION_DATA_SERIALIZER_BEAN_NAME)
import org.springframework.data.gemfire.config.annotation.ClientCacheApplication;
@ClientCacheApplication(copyOnRead = true, subscriptionEnabled = true)
Spring Session for Apache Geode / VMware Tanzu GemFire (SSDG) released the below versions which include the dynamic configuration improvement to allow the (dynamic) configuration of SSDG via a JVM System property (
spring.session.data.gemfire.session.serializer.bean-name=SessionDataSerializer) on the Gemfire server-side which could help users / customers to enable Data Serialization and Deltas with less efforts.
- Spring Session for Apache Geode [& VMware Tanzu GemFire] (SSDG) 2.3.6.RELEASE, 2.4.4, and 2.5.1 are now available.
- SSDG 2.3.6.RELEASE is based on Apache Geode 1.12.2 and VMware Tanzu GemFire 9.10.7 [changelog]
- SSDG 2.4.4 is based on Apache Geode 1.13.2 [changelog]
- SSDG 2.5.1 is based on Apache Geode 1.13.2 [changelog]
- Notable improvements in this release include:
You can download the bits or configure your Maven POM / Gradle build accordingly.
For example:
Maven:
<dependencies>
<dependency>
<groupId>org.springframework.session</groupId>
<artifactId>spring-session-data-geode</artifactId>
<version>2.3.6.RELEASE</version>
</dependency>
</dependencies><repositories>
<repository>
<id>spring-snapshots</id>
<url>https://repo.spring.io/snapshot</url>
</repository>
</repositories>
Gradle:
Gradle build configuration would be similar.
Note: If you are using the SBDG spring-geode-starter-session dependency, you need to apply Maven dependency management to control the SSDG dependency version pulled in by Spring Boot.
2. In the VMware Gemfire server side, what you need to do is add the below system property to cacheserver (even if enabled with PDX) which could tell the Spring Session (i.e. SSDG) that it should use GemFire / Geode Data Serialization on the server-side.
--J=-Dspring.session.data.gemfire.session.serializer.bean-name=SessionDataSerializer
If you are using VMware Tanzu Gemfire [VMs] and it doesn't allow you add system property to cacheservers JVM, one considerable way is creating
custom function(1*) to enable the above system property so that cacheservers tell SSDG client know that it is using GemFire / Geode Data Serialization on the server-side. Additionally, you need to deploy the below jars into VMware Tanzu Gemfire [VMs] Cluster in order to handle request from SSDG client application with enablement of Data Serialization and Deltas.
For example:
spring-session-data-geode-2.3.6.RELEASE.jar
spring-session-core-2.3.3.RELEASE.jar
spring-data-gemfire-2.3.9.RELEASE.jar
spring-data-geode-2.3.9.RELEASE.jar
spring-expression-5.2.15.RELEASE.jar
If you are using VMware Tanzu Gemfire [k8s], you can add the above system property by overriding
jvmOptions field of servers.
# 1*A custom function (
SetSystemPropertyFunction) to enable this system property:
package io.pivotal.functions;
import org.apache.geode.cache.Cache;
import org.apache.geode.cache.CacheFactory;
import org.apache.geode.cache.Declarable;
import org.apache.geode.cache.execute.Function;
import org.apache.geode.cache.execute.FunctionContext;
import java.util.Properties;
public class SetSystemPropertyFunction implements Function, Declarable {
private final Cache cache;
public SetSystemPropertyFunction() {
this.cache = CacheFactory.getAnyInstance();
}
public void execute(FunctionContext context) {
System.setProperty("spring.session.data.gemfire.session.serializer.bean-name", "SessionDataSerializer");
context.getResultSender().lastResult(true);
}
public String getId() {
return getClass().getSimpleName();
}
}
How to deploy this function:
gfsh>deploy --jar=/Users/xxx/EnableSysProFunction/functions/target/functions-0.0.1-SNAPSHOT.jar
How to execute this function:
gfsh>execute function --id=SetSystemPropertyFunction