Why is my Bandwidth Utilization so high, even though I've chosen to limit it?

Products

IT Management Suite

Issue/Introduction

I've enabled Bandwidth Throttling down to about 50 percent, but it doesn't seem to be working. There is much too much bandwidth being used. Why isn't my bandwidth being throttlled?

Environment

ITMS 8.x

Resolution

Bandwidth Throttling with percentages is misleading and misunderstood. The natural expectation is that a 50 percent throttle will use only 50 percent of the available network bandwidth. For instance, if you have a 100 M link, with 20 percent overhead normally, the expectation is that the Altiris packages never use more than 40 M of that pipe, leaving 40 M for other normal spikes in traffic.

However, to actually achieve this expectation, there would have to be some central agent (like a router) that could tell what traffic is what, and throttle that traffic accordingly.

Altiris is not in this position, and can only use either the server or client. We've chosen to use the client. The client has no idea of how big the pipe should be, only what it can currently see. It does this with an ICMP check, determines what is "available" and then takes a percentage of that. However, each client is doing the same thing, and their values add up. To see this, look at the examples below. In fact, the only time this will work as expected, is if there is only a single client downloading.

Note: The examples and explanations below are generic and not completely representative of a real network. In truth, you should expect to see a significant variance from this, as both Altiris and other network traffic fluctuates with spikes and valleys. It should also be noted that most packages are so small that even the 3-minute delay between checks isn't long enough to "stabilize." Thus, only for large package deployments is this an issue to be aware of.

Example 1: A Simple Starter

As a simple demonstration, consider a 100 M network with no "normal" traffic, two clients downloading packages, and throttling set to 50 percent. Client 1 checks the network, sees 100 M available, and takes 50 percent of that, or 50 M. Client 2 looks at the network a moment later, sees 50 M available, and takes 50 percent of that, or 25 M. Total utilization immediately reaches 75 percent.

Assuming neither client gets a chance to check their speed again, these packages will use 75% until done, not the expected 50 percent. Over a longer time, this would decrease to about 72 percent with a more even distribution, but it will never get close to the expected 50 percent.

Example 2: Real/Complex

Now lets look at a more representative situation. In reality, once a package is sent out to execute maybe at night, then a number of computers may get the package at approximately the same time. For a larger package, it should be expected to have significant overlap. We'll assume a completely dedicated network, for ease of demonstration. The charts should be read left to right, in that client 1 checks in first, then client 2 perhaps a second later, and so on. After 3 minutes, client 1 checks in again, then client 2, etc. In reality it's not this neat, but it works for this discussion.

The scenario consists of a 100 M total bandwidth network with nothing but these Altiris packages. Unlike the simple example, we will assume a length of time allowing that every 3 minutes a client "checks in" to determine their optimal download speed. From our internal testing, this is very close to what we see in actual client logs during a throttled package download.

First, we'll look at 5 systems with throttling at 50%:

Client System:	1	2	3	4	5	Total
Pre-Download %	0	0	0	0	0	0
First Speed Check	50	25	12	6	3	96
Speed Check 2	23	22	23	15	9	92
Speed Check 3	15	19	21	18	14	87
Speed Check 4	14	16	19	19	16	84
Speed Check 5	15	15	17	18	17	82
Speed Check 6	16	16	16	17	17	82

Notice that it seems to stabilize right around 82 percent, though we selected 50 percent.

Now we'll look at 10 Systems set to 10 percent:

Client System:	1	2	3	4	5	6	7	8	9	10	Total
Pre-Download %	0	0	0	0	0	0	0	0	0	0	0
First Speed Check	10	9	8	7	6	6	5	5	4	4	64
Speed Check 2	4	5	5	5	5	5	5	5	5	5	49
Speed Check 3	5	5	5	5	5	5	5	5	5	5	50

This time, we stabilize around 50 percent overall utilization, higher than the previous test as a percentage of the "goal" because there are more systems. We also get to a stable "point" a lot faster due to the larger amount of systems.

Recommendations:

As the number of systems goes up, so does utilization, even with very small values chosen. Therefore, it would be wise to test your scenarios prior to rolling out policies.

Another wise option in a diverse network would be to clone the agent policies and apply different throttling amounts to different segments. Thus you might have 50 percent for systems at HQ, and maybe 10% for outlying systems over T1 links.
You may want to consider fixed utilization amounts instead of percentages. This allows for a little more control over the amount, but can bring the network down quickly if you do not correctly calculate how many systems will be downloading at a time.
Finally, smart utilization of Package Servers is the best method of controlling problems related to this kind of scenario.