book
Article ID: 294444
calendar_today
Updated On:
Issue/Introduction
There is no product issue, but a gap in understanding we are seeing across customers using Bosh AZ's, with GemFire on VMs (PCC). This gap of understanding can and will negatively impact your service instance if you do not understand how AZ's will map to GemFire RZ's. The term GemFire will be used going forward in this article rather than the full GemFire for VMs product.
Specifically, at the Bosh layer, customers configure plans that determine AZ's, default number of servers, etc. Then, at the GemFire region configuration layer, developers are determining how many copies of the data to be stored in the system for redundancy. This is the region configuration redundant-copies setting. Generally, these configurations occur across different teams on the customer side, with people not realizing how these may interact to cause sub-optimal behavior.
For example, consider the following configuration combination for a given system:
A) 3 AZ's (AZ1,AZ2, AZ3)
B) 4 default number of servers (S1,S2,S3,S4)
C) 3 redundant-copies for region data, which are mapped to unique RZ's when configured (RZa, RZb, RZc)
The result of this configuration will drive as much balance as possible in the mapping of AZs to servers. Thus, the outcome of this mapping is likely something like this:
S1 is assigned to AZ1
S2 is assigned to AZ2
S3 is assigned to AZ3
S4 is assigned to AZ1
Thus, we have both S1 and S4 on AZ1, but only S2 on AZ2 and S3 on AZ3. Fine. No major problems here. This is as much balance as the system can achieve with 4 servers and 3 AZs. However, let's like this more into the realm of GemFire now.
The GemFire redundancy-zones are mapped to the AZs for each server:
S1 is assigned to redundancy-zone RZa
S2 is assigned to redundancy-zone RZb
S3 is assigned to redundancy-zone RZc
S4 is assigned to redundancy-zone RZa
The above may seem relatively harmless, but there are potential issues that must be understood to avoid the pitfalls. Let's continue to the specific details of the 3 redundant copies of the data, when considering the RZ restrictions.
When using GemFire with RZ's, GemFire will place only 1 copy of the data into each RZ. Let's drill into a specific region R, with mostly a standard default region configuration. Trying to be as concrete as possible with the explanation, let's consider the default configuration for total-num-buckets=113. This means there are 113 primary buckets for each region R. With number of redundant-copies=2, this means 3 total copies. Each of the secondary copies will have 113 buckets each, so 113 for the 2nd copy and 113 more for the 3rd copy.
This gives a total of 339 buckets, with the important restriction that only 1 copy of each of the 113 buckets is placed in each RZ. Perhaps we should label these as follows, for a given bucket B1. Let's name the 3 copies of B1 as B1p (primary), B1s1 (first redundancy copy), B1s2 (second redundant copy).
The important restriction for understanding is to realize that B1p must be in a different RZ than B1s1 or B1s2, and B1s1 must be in a different RZ than B1s2. We have 3 total RZs, so that's fine, UNTIL you consider that we are distributing this data across 4 servers.
So, let's place the 3 copies of bucket B into the unique RZs. One possibility is as follows:
B1p goes to RZa (S1 OR S4 !!!)
B1s1 goes to RZb (S2)
B1s2 goes to RZc (S3)
We have to follow this mapping for each of the 113 primary buckets. Let's consider another such bucket B2:
B2p goes to RZb (S2)
B2s1 goes to RZc (S3)
B2s2 goes to RZa (S1 OR S4 !!!)
We need to go through this process for each of the 113 buckets B1-B113. One can see, in the end, that every bucket Bn has a copy in RZb on S2, and every bucket Bn has a copy in RZc on S3. Meanwhile, RZa splits the load across 2 servers S1 or S4. The result is as follows:
S2 ends up with 113 total buckets.
S3 ends up with 113 total buckets.
S1 ends up with 56 or 57 buckets.
S4 ends up with 57 or 56 buckets.
The result is a completely imbalanced data footprint that can NOT be resolved by rebalance. The RZ restrictions prevent it. S2 and S3, in the example, will always have exactly 113 buckets, carrying double the heap load when compared to S1 or S4.
Now, it is possible that none of this drives issues, if the heaps are substantially sized enough per server, such that S2 and S3 do not experience any issues related to very high heap consumption. S1 and S4 are underloaded, from a heap footprint perspective, and there would be more waste on those servers while S2 and S3 are battling high heap related issues.
The question is how to resolve issues if the members are hitting such issues. Perhaps the better question is how to achieve data balance completely.
Environment
Product Version: 1.14
OS: All
Resolution
We have various options for resolution here.
1. If the data imbalance is not a concern, you can simply over provision heap, such that even when S2 and S3 are carrying double the load of S1 and S4, they are not negatively impacted and able to stay healthy without burdening the cpu/heap/etc.
2. The better approach is to always drive a configured combination that drives balance, where the servers are divided equally across the AZs. This will achieve the most healthy client load and data load across the system. You have multiple options here.
- Increase the AZ's to match the number of Servers. In this case, we would end up with AZ1,AZ2,AZ3,AZ4. This would drive S1 to AZ1 to RZa, S2 to AZ2 to RZb, S3 to AZ3 to RZc, and S4 to AZ4 to RZd. The result of this would essentially balance all data evenly across the 4 zones (interchangeable with members in this case). Specifically, getting into the 339 total number of buckets, we would have 339/4 buckets roughly per member. Each member would have 1/4 of the primary buckets + 1/4 of the second copy + 1/4 of the third copy resulting in about 85 buckets per member. This decreases the bucket counts for S2 and S3 from 113 to 85, and increases the bucket counts for S1 and S4 from 56 or 57 to 85 roughly, balancing out the data, and hopefully giving better health to S2 and S3 compared to the original configuration.
- Another option is to make the number of servers a multiple of the number of AZs. In our example, the system would have complete balance with 3 servers, or 6 servers, or 9 servers, etc., given the fact that we have 3 AZs. In our example, we know that 3 servers would not provide enough heap, because now we would be storing a full 113 buckets on each of S1, S2, S3 servers in the 3 server system. We already know that this could prove problematic if the heap are not sufficiently sized in the plan. However, it is very possible that going to 6 servers could resolve issues. Let's use 6 servers with 3 AZs and run the numbers. We know that we will balance the system such that we end up with 2 servers per AZ, and thus per GemFire RZ. The mapping would be something like (S1, S4 in AZ1 in RZa), (S2, S5 in AZ2 in RZb, and (S3, S6 in AZ3 in RZc). Getting back to the bucket specifics, we would need to put each of the 3 Bn buckets (Bnp, Bns1, Bns2) into unique RZs. So, for the 3 copies of B1, we could have (B1p in S1, B1s1 in S2, B1s2 in S3). Then, for B2, we could have (B2p in S4, B2s1 in S5, B2s2 in S6). Extrapolating, we can see that each member will end up 56 or 57 buckets. 339/6=roughly 57. This greatly reduces the heap footprint, achieves balance of heap load and client load as well. Additional detail is that the PRIMARY buckets will be evenly balanced across those 6 servers also, which is important for optimal health so that client load on the system is most evenly distributed.
In the end, when sizing these plans, assigning AZ's in Bosh, determining default number of servers, and hopefully working with the teams determining how large of a heap footprint is needed and how many copies of the data will be stored, you will take what has been discussed here into consideration.
The simplest option is to always make sure that the number of servers (GemFire cache servers) in your environment is a multiple of the number of AZs configured. Any other combinations will drive some imbalance, and while that may be minimal and not impactful in any way, one should understood how these combinations impact the balance of load in your GemFire systems.