At my current shop we are relying very heavily on automated regression testing; unit, integration, system, anything we can automate. We also measure our code coverage during test runs as one way of monitoring their thoroughness. They’re also peer-reviewed.
All of this is great in that you can get immense warm-fuzzies when you check in even a large change to existing code and all the tests still pass. Huzzah!
But then comes the pain. Tests that work when you run them locally don’t work when you run them in the context of running all the tests. Tests that work running all the tests locally don’t work on the official build machines. Tests that work on most build machines fail on some build machines. Sometimes.
Purists will say that these problems are caused by poorly-constructed tests; that every test should run entirely in isolation with no side-effects from any other part of the codebase, the operating system, the hardware, etc. This is facile, over-simplistic, wrong-headed, and just plain dumb.
Now that I’ve committed to the insult, let me explain why…
Firstly, the developer is at the mercy of their testing toolset. We use the MSTest framework. It turns out that there are actually three different physical mechanisms used to run tests; within Visual Studio, via the MSBuild system, and a shell-executable. All are slightly different, particularly in terms of how the AppDomains and other CLR fundaments are handled. And there isn’t really a way to unify all three without basically writing and maintaining your own testing framework.
Secondly; timing, timing, timing! When your tests are dealing with autonomous processes of any sort — the testing code itself, background threads, external databases or services — they will run slightly differently on different execution platforms. This is actually a Very Good Thing, as it will reveal many latent race conditions. (And in this world of highly-distributed asynchronous systems, aren’t these the biggest bugaboo?) However, once revealed, these race conditions can be very difficult to resolve.
Thirdly: all automated testing environments are artificial to some degree and not necessarily representative of the actual production execution environment. And there isn’t always a lot that can be done about that. For instance, build servers are typically running enterprise versions of operating systems rather than standard user versions (apparently they need this to support some of the management features, etc. that corporate IT is all on about). As a developer, lots of this stuff can be out of your hands, either organizationally or in terms of your available budget.
Fourthly: Everything is just so darned parallel! Test frameworks like to run multiple tests at the same time. But computer systems inherently have singleton resources that don’t play nicely in this model: disks, registries, files, databases, web hosts, etc. Establishing and maintaining test data in these singletons when multiple tests are doing the same thing is problematic to say the least. Now, you can spend tons of time and effort on mocking out all this stuff in your tests, but in my view that just waters down your tests. And anyway, unless that is how your production environment is actually structured, you’re just removing your tests further from reality.
What it boils down to is that automated tests will force you — unwillingly and violently at times — to deal with all the environmental assumptions your design and implementations are making. This is good, overall. But still painful.
Automated testing: the dentistry of the software industry…