Teatime: Testing Application Performance

Welcome back to Teatime! This is a weekly feature in which we sip tea and discuss some topic related to quality. Feel free to bring your tea and join in with questions in the comments section.

Tea of the week: Still on a chai kick from last week, today I’m sipping on Firebird’s Child Chai from Dryad Teas. I first was introduced to Dryad Tea at a booth at a convention; I always love being able to pick out teas in person, and once I’d found some good ones I started ordering online regularly. It’s a lovely warm chai, with a great kick to it.

Today’s topic: Testing Application Performance

This is the first in a two-part miniseries about performance testing. The first time I gave this talk, it was a high-level overview of performance testing, and when I asked (as I usually do) if anyone had topic requests for next week, they all wanted to know about jMeter. So I spent a week learning a little bit of jMeter, and created a Part 2.

This talk, however, remains high-level. I’m going to cover three main topics:

Performance Testing
Load Testing
Volume Testing

I have a bit of a tradition when I talk about performance testing: I always illustrate my talks with pictures of adorable bunnies. After all, bunnies are fast, but also cute and non-threatening. Who could be scared of perf tests when they’re looking at bunnies?

White Angora Bunny Rabbit — Aww, lookit da bunny!

Performance Testing

Performance testing is any testing that assesses the performance of the application. It’s really a super-group over the other two categories that way, in that load testing and volume testing are types of performance testing. However, when used without qualifiers, we’re typically talking about measuring the response time under a typical load, to determine how fast the application will perform in the average, everyday use case.

You can measure the entire application, end to end, but it’s often valuable to instead test small pieces of functionality in isolation. Typically, we do this the same way a user would: we make an HTTP request (for a web app), or a series of HTTP requests, and measure how long it took to come back. Ideally, we do this a lot of times, to simulate a number of simultaneous users of our site. For a desktop application, instead of adding “average load”, we are concerned about the average hardware: we run the application on an “average” system and measure how long it takes to, say, repaint the screen after a button click.

But what target do you aim for? The Nielsen Normal Group outlined some general guidelines:

One tenth of a second response time feels like the direct result of a user’s action. When the user clicks a button, for example, the button should animate within a tenth of a second, and ideally, the entire interaction should complete within that time. Then the user feels like they are in control of the situation: they did something and it made the computer respond!
One second feels like a seamless interaction. The user did something, and it made the computer go do something complicated and come back with an answer. It’s less like moving a lever or pressing a button, and more like waiting for an elevator’s door to close after having pressed the button: you don’t doubt that you made it happen, but it did take a second to respond.
Ten seconds and you’ve entirely lost their attention. They’ve gone off to make coffee. This is a slow system.

Load Testing

Load testing is testing the application under load. In this situation, you simulate the effects of a number of users all using your system at once. This generally ends up having one of two goals: either you’re trying to determine what the maximum capacity of your system is, or you’re trying to figure out if the system gracefully degrades when it exceeds that maximum capacity. Either way, you typically start with a few users and “ramp up” the number of users over a period of time. You should figure out before you begin what the maximum capacity you intend to have is, so you know if you’re on target or need to do some tuning.

Like the previous tests, this can be done at any layer; you can fire off a ton of requests at your API server, for example, or simulate typical usage of a user loading front-end pages that fire requests at the API tier. Often, you’ll do a mix of approaches: you’ll generate load using API calls, then simulate a user’s degraded experience as though they were browsing the site.

There’s an interesting twist on load testing where you attempt to test for sustained load: what happens to your application over a few days of peak usage rather than having a few hours and then a downtime in between? This can sometimes catch interesting memory leaks and so forth that you wouldn’t catch in a shorter test.

Volume Testing

While load testing handles the case of a large amount of users at once, volume testing is more focused on the data: what happens when there’s a LOT of data involved. Does the system slow down finding search results when there’s millions or billions of records? Does the front-end slow to a crawl when the API returns large payloads of data? Does the database run out of memory executing complex query plans when there’s a lot of records?

Do you load test at your organization? How have you improved your load testing over the years? What challenges do you face?