While many folks think about, talk about and even worry about load testing their product, they end up not testing while they hide behind many excuses. Load testing is really hard to do, efficiently. It is expensive. By the time we’re done, it won’t be valid. We don’t have time. Even Dilbert dodges load testing.
Usually, load testing means setting up a load test environment, instrumenting for measurement, generating a load or replaying logged events and finally gathering results for analysis and comparison. Lather, rinse, repeat often.
The problems typically start immediately. Setting up a load test environment can be very expensive – it needs to be a reasonable approximation of your production environment. That gets very expensive, quickly. I wrote a few weeks back that Cloud computing environments can really be instrumental in cutting into the need to purchase tons of gear for temporary testing tasks.
Oh, by the way, it is not simple at all to generate a useful load. You can use a few client machines to replay some scripts, but realistic scripts are hard to write especially dynamic scripts that drive fancy AJAX based front ends.
Enter the practice of replaying logged events – great idea on the surface, just take real world traffic logs and replay it. Heck you can even grab the events from the Production environment – no such thing like the real thing, right? Until you realize that these logged events typically require the state of your system to be consistent with when the events actually occurred. Is it at all practical to “snap shot” your production environment state that is synchronized to the point in time that matched your log data? Sometimes that can be very difficult to do correctly, if at all – the data set can be very large, it may really burden your production environment, it may sever transactions and simply not work at all.
Finally, of course, even if you are able to grab a synchronized production snapshot, move it to your load testing environment and find some way to crank up the front end loads, you’re not ready yet. There is the landmine of replaying events against a production data set – whoops. Do you have customer emails in your data and does everyone now get messages from your load test? Are there financial transactions – might not want to replay those again. Sure, you can isolate the environment so email doesn’t get out or transactions don’t take place, but every time you alter the environment you may miss something important about your load testing.
In my next post, I’ll share a practical idea that addressed these challenges and lets you get efficiently get you load testing done.