VSE Performance Tuning Guidelines

Products

CA Application Test CA Continuous Application Insight (PathFinder) Service Virtualization

Issue/Introduction

This article summarizes common performance tuning steps and techniques to help debug VSE performance issues.

Environment

All supported DevTest releases.

Resolution

This document summarizes common performance tuning steps and techniques to help debug VSE performance issues.

Preparing your machine for production/benchmarking

Logging

The number one performance constraint is the VSE and client side logging setup. Under high load, logging can significantly slow performance. The logging setup is controlled in LISA_HOME/logging.properties and you should ensure the log level is set to INFO; this will alert you to heap sizing problems on the servers.

Recommended logging.properties configuration

The two most important lines are
rootLogger.level=WARN
and
logger.VSE.level=WARN

VSE production settings

VSE is configured to clear its database cache every 10 seconds. This is a reasonable trade-off in a development environment where service images are being constantly updated but when you are benchmarking or going into production, you should tune this value to reflect the static nature of the service image database.

Preparing your VSE instance

LISA_HOME/lisa.properties describes the setting and its best to override it in LISA_HOME/local.properties as follows:

lisa.eclipselink.cache.timeout.ms=3600000 (5.x only, 6.x does not use a database)

By default we cache Eclipselink object for no longer than 10 seconds. This means that changes made to the service image database via the service image editor will propagate to a running virtual service model in no longer than 10 seconds and on average 5 seconds, assuming the object is already cached by the VSE. If the object is not already cached the propagation will be immediate. 10 seconds is a reasonable trade-off in a development environment where you are making changes to service images. In a production environment with a heavily loaded VSE server you might consider revising this number upwards to at least a minute (60000 ms) or even an hour (3600000 ms) to reduce the number of SQLs issued and thus increase performance.

lisa.eventPool.maxQueueSize=65535

It is sometimes a good idea to crank up the maximum size of the internal event queue in local.properties. This event queue defaults to 8192 and under extreme load this becomes a bottleneck. You would normally leave this alone. The only way right now to see if this is the bottleneck is to run jstack against the VSE and see how many threads are blocked inside the EventHandler class waiting for the queue to be drained...

lisa.pathfinder.on=false

Disable Pathfinder. It adds a reasonable overhead, especially when you are going for maximum throughput.

lisa.CycleExecHistory.buffer.size=2 (6.x does this for you so this is a 5.x thing only)

Reduce the amount of history that each VSE worker thread keeps. This reduces heap requirements and saves CPU.

Model settings

Virtual Service Models should, in general, be deployed to production with 0% think time and a Capacity of more than 1 (the default) but probably no more than 10 to 20, depending on how many other service models the VSE has deployed.

The Execution Mode should be set to "Most Efficient;" see also the discussion of recording the original service model (more flexible versus more efficient). A "more efficient" model is almost twice as fast as a "more flexible" model.

Capacity here really means how many threads there are to service requests for this service model. A good rule of thumb for x86 machines is capacity= two times the number of cores. A good rule of thumb for x86 machines is capacity= two times the number of cores. It is highly unlikely, unless the virtual service is I/O bound, that adding capacity beyond this will result in increased throughput: the service will normally (should normally) be CPU-bound.

Under the hood

VSE works much the same way as most modern, high performance applications servers do. Each service has an endpoint, typically a network socket. VSE has one Selector Thread per service socket. The Selector Thread uses a Java "SelectableChannel" to very efficiently process inbound and outbound network IO. Inbound IO is handed to a thread pool bound to the service. The size of the thread pool is fixed by the service Capacity.

In practice, it is very difficult to saturate the thread pool with standard models and a capacity of 10, so you might want to start with a Capacity of only 5. Much work has gone into making the VSE execution path extremely "tight" and it is generally only when logging, XML parsing or call-outs to other systems get involved that the VSE request/response cycle is not CPU bound.

Of course, if you really do want to model the original service think time, then the Think Scale setting will be 100% and the worker threads will put the response on the DelayedResponseQueue and then returned to the pool. But this document is about maximizing throughput... you want a Think Scale of 0.

LISA load test clients

If you are using LISA to actually drive the load on the VSE (and there are many good reasons to do this) then you must ensure that your client side does not become the bottleneck too quickly. Unless you have hand-crafted Java load clients, no matter which framework you use, it gets in the way eventually unless you spread them over several physical machines.

There are a few simple things you can do in your LISA staging doc and test case to ensure you won't totally slow down after a few vusers.

If you must have a report in the staging doc, make it a LoadTest report. The default LISA report will kill you under load, it simply collects too much data and hits the reports DB too hard. It's a great report for functional testing but it's no good in performance tests.
Do not use global data sets in a performance test. They are effectively "remote singletons" and every instance will line up to access these remotely. When your vuser calls the singleton it also (potentially) serializes the entire TestExec state across the wire in case the data set refers to testexec properties. At the very least there is a small subset of the testexec that is serialized as part of the call, so do not use global data sets.
Use TestRunner to launch the test. There is less test event overhead. If you use Workstation and stage either locally or remotely make sure the Events tab in the GUI is set for the terse filter; otherwise your test will spend more time sending events to the coordinator and workstation than it does hitting the server.
Make sure you give your simulator VMs plenty of headroom. Out of the box they get 512m of heap: it is usually a good idea to do -Xms1024m -Xmx1024m.

Make sure you are using the Hotspot Server VM. This is the out of the box setting so you should be using it. Verify by looking in the various log files, early in the piece:

 vse_log.log:2009-08-31 19:50:11,641 [main] INFO 				      
 com.itko.lisa.test.Environment - JVM info: 
 java.vm.info=mixed mode,
 java.vm.name=Java HotSpot(TM) Server VM,
 java.vm.specification.name=Java Virtual Machine Specification,
 java.vm.specification.vendor=Sun Microsystems Inc.,
 java.vm.specification.version=1.0,
 java.vm.vendor=Sun Microsystems Inc.,
 java.vm.version=10.0-b22}{java.version=1.6.0_06}

Design your LISA test such that it take "a while" (but not too long) to execute. A test with a single step and several hundred vusers pounding it will spend most of the CPU cycles on setup and teardown, not generating the client load. Usually the easiest way to achieve this is to use a Counter data set to loop the test and finish at the end of the data set (say 1 to 10,000 in increments of 1).

Tuning Garbage Collection

JVM garbage collection has been observed to cause VSE performance issues, with responses sometimes taking several seconds. To troubleshoot and resolve this, add the following to VSE's vmoptions file. These options apply only to Sun's / Oracle's JVM.

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Bounce VSE, and watch the output (stdout). You will see lines like this:

3603.403: [GC [PSYoungGen: 121111K->26065K(132608K)] 533555K->441076K(553280K),0.0761500 secs] [Times: user=0.19 sys=0.00, real=0.11 secs]

3603.481: [Full GC [PSYoungGen: 26065K->0K(132608K)] [PSOldGen: 415010K->388919K(425792K)] 441076K->388919K(558400K) [PSPermGen: 55456K->55456K(59456K)], 3.6979719 secs] [Times: user=4.51 sys=0.09, real=4.70 secs]

In this case, the OldGen garbage collection was causing VSE to slow down significantly, which you can see by comparing it to the output when only YoungGen collection happened to the output when OldGen collection happened. Switching garbage collection strategies may help. Add these lines to your vmoptions file:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:MaxGCPauseMillis=500

Additional Information

Make sure the VS is running in Most Efficient mode as other modes might have impact on the performance.