This tech note describes changes to VSE that greatly improve scalability when there are deliberate delays in the execution of the virtual service.
All supported DevTest releases.
N/A
Let's briefly examine the runtime execution model of a virtual service before the changes. Say you have deployed a model with an HTTP responder step with a think time spec of 1 second in the model (VSM). When you deploy the virtual service, you specify a concurrent capacity of 2 and a think time % of 50%.
A new request comes in and is handed to the thread pool (size 2). The thread "follows" your VSM specification until it hits the responder step where the think time is set to 1 second. Because the overall think time % is 50%, that worker thread now goes to sleep for 500ms less the specified delay time in the service image. Meantime another request has come in, just milliseconds after the first. It is handed to the thread pool and the same process occurs so now we have 2 out of 2 worker threads sleeping for 500ms. Meantime 100 more client requests come in... and the first two request worker threads wake up, respond and get requests 3 & 4. These 2 requests have already been waiting 500ms and they are subject to a further delay of 500ms because that's what the execution model has been told what to do. By the time the client sees a response, 1 second has gone by (not including network latency and any internal processing time in the VSE). Client requests 5 & 6 get a 1.5 second response time, etc, etc. The solution, until now, has been to increase the concurrent capacity to be equal to or above the maximum concurrent client requests that you expect. Actually it's a little bit more complicated than that, but that's been the rule of thumb. The problem there is each worker thread consumes, well, a thread and all the associated overhead of a VSE thread, which can be considerable.
Now there is a new runtime execution model that dramatically reduces the number of worker threads ("concurrent capacity") needed.
In the new model, everything is the same except for two key differences. By default, the think time on the actual HTTP response step in the VSM is ignored. At runtime, the response is assembled and the think time specification is taken from the response in the service image. The overall think time scalepercentage from the deployed virtual service is applied and that that is used to calculate the period of time the response will spend on a delayed queue. The response is placed on the delayed queue and the worker thread does NOT sleep but goes back to being available to service other incoming requests. When the timeout expires on the queued response, it is sent back to the client.
This means that the worker threads are rarely idle (unless there are no incoming client requests to be serviced), and unless you have very CPU-intensive models (typically models with lots of XPath queries) you will typically be able to run thousands of concurrent clients with very few VSE worker threads. Even better, each client really does see the "correct" response delay rather than the delay that was specified in the model plus the delay in waiting in the request queue.
If you put a delay spec of anything other than 0 on any of the steps in the VSM, and do not deploy with a zero % think scale, you are asking for trouble (the request queue will most likely grow unbound until your clients time out).
If you really do want to put a delay spec on the HTTP responder step in the model and actually apply a positive % think scale, then you can set a LISA property lisa.vse.processResponseStepThinkTime=true, but clients timing out is the (unintended) consequence of stalling your worker threads.
This change was designed specifically to reduce the system resource required by VSE if you have many concurrent clients and a need to deliberately slow down response times. The more typical use case is to simply set the think scale to 0 % and throw as many worker threads as sensible to the concurrent request capacity pool. The rule of thumb here, for CPU-bound VS models, is 2 X cores.
To get an idea of what a difference this makes, in testing we deployed a simple listUsers web service with a think time spec in the image of 10 seconds and a think scale of 50% and ONE worker thread (concurrent request capacity = 1). We then spun up a 100 virtual user LISA test case on the client side (with 0% think time) and with an almost idle CPU each client saw the correct response time (5 seconds). We then set the think time on the model step to 10 seconds, set lisa.vse.processResponseStepThinkTime=true and redeployed with a concurrent request capacity (worker thread count) of ten, re-ran the same lisa test and much more CPU and memory consumed, the clients fairly quickly started to time out.
From now on, you should rarely need more than a 'concurrent capacity' of, say, 2 for any given VSE model, even if you have hundreds of concurrent clients. So we should look at changing the terminology for 'concurrent request capacity' and make it something like 'exec thread pool size'.
There are plans to generalize this mechanism so that all 'think time' in all LISA products is done like this - rather than call Thread.sleep() we instead put the thing that is executing into a delayed queue. That way we can decrease the number of threads we need to do the same number of work or, conversely, with the same number of threads vastly increase the number of 'virtual users' we can support in a single simulator JVM.
This diagram may or may not help ;-). Remember too, there can be two services listening on the same port but with different endpoints (e.g., 8001:/LisaTravel and 8001:/LisaBank). In which case there would be two blue blocks with two internal request queues but still only a single Port Server thread servicing port 8001. But these guys are very, very quick. Outrageously quick (we have measured 13,000 request response pairs per second on a Mac Pro for one java port server thread) and if we ever get there we can add more threads per port but the VSE will bottleneck before then.
Hand-rolled models
If you have a hand-rolled model with a SocketEmulator request followed by a Compare Strings step and then a Socket Emulator response step, the delay spec in the lookup step is saved in the testexec property 'lisa.delayed.response'. The respond step will specifically look for this property and parse it in the usual fashion. If there is a delay spec, the responder will enqueue the response as described above.
MQ / JMS models
There is no JMS standard for delayed message delivery. Many JMS vendors have a native way to do delays, see http://java.dzone.com/articles/sending-delayed-jms-messages. However, all of those vendors' methods are different, and some can't be done outside of an API call.
IBM MQ, as well as some other JMS vendors, have no way to do delayed message delivery. So we have implemented a limited version of the delayed response queue for MQ and JMS. It operates the same way as the SocketEmulator: If the testexec property lisa.delayed.response.spec is defined when a JMS or MQ publishing step is executed then the actual message delivery will be placed on a wait queue as described above.
This is a limited version of the delayed response queue with the following restrictions:
The delayed message delivery feature is intended for use in sending messages in response to requests. Because it depends on the response queue always being the same, there are a few things the client side must keep in mind:
Summary