I’ve been looking into code metrics again. I’ve been interested in them on and off over the years, and I happen to currently be working at a shop where we are very data-driven in our development processes. This is an interesting change of pace for me compared to most shops I’ve been at where metrics are not only not utilized, but actively scorned.
(I’m going to talk just about code metrics here. There are also process/project metrics like burn-down rate, earned value, etc. but here I’m more interested in metrics that describe our primary work product — code.)
Michael Feathers gave an interesting talk which, among other things, showed some examples of how he uses codebase metrics. Go there now; you can start at minute 25 if you’re pressed for time…
It’s pretty easy to see how metrics can be valuable, especially for a large codebase that is evolving over years. For small projects, maybe not so much. But I think there’s a lot power in being able to trend information like:
- Areas of constant change (instability)
- Complexity of subsystems or even more granular parts
- Size — how much stuff do we have to worry about?
- Unit testing size compared to raw size
- Unit testing code coverage over time for components
Without data to support you, it is very difficult to argue convincingly to non-developers that effort should be spent to rework problematic sections of code, or to spend time refactoring or adding additional tests. And, let’s face it, this is a perennial complaint from developers: we are forced to compromise quality to meet delivery dates and not given enough time to fix things later. Tools that help us alleviate this problem are most welcome, I think. Certainly they have a better chance of success than just complaining about the situation and hoping product/project management eventually gets tired of hearing it and gives in…
And, a real mind-bender: Without data, how sure can you really be about your intuitive, gut-feel for where the problems actually are in your codebase? Wouldn’t it be nice to have a metrics system you could use to investigate your theories about the codebase in something like an objective manner?
One interesting point that Mr. Feathers makes is that not everything revealed through metrics is a problem. A high degree of coupling for a particular component is not necessarily bad in and of itself. If you can correlate that to high rates of change, or a large number of problem reports, then you’ve got something more interesting.
The point is, metrics are just data, not information. They require thoughtful interpretation, rather than just blind reaction. Gathering complexity metrics and declaring that all components must be within certain bounds is pedantic. But to take a look at things that fall outside certain boundaries is reasonable, and to keep an eye on it over time is prudent.
Which brings me around to lines of code. LOC is, essentially, a measure of volume. It is perhaps the first metric for code that anyone every thought of, and arguments about metrics always seem to wind back to the beleaguered LOC measurement. In my experience, most arguments against looking at LOC as something interesting or valuable boil down to the fact that there is too much variability in the number for it to be meaningful. Here are a few things that cause variability:
- Expressiveness of the programming language
- Whitespace usage by different programmers
- Commenting practices
- Code formatting practices
All of these are legitimate sources of variation. But to me they don’t invalidate a LOC metric, but rather indicate the need for multiple flavors of LOC, such as:
- Raw LOC, including whitespace
- Instruction count (as Visual Studio provides for IL-based languages)
- Ratio of comments to code
- Ratio of unit test code to product code
- Function points (this is also a measure of volume)
All of these are interesting in different ways, and play off of each other in interesting ways. Even more; we might use different LOC metrics to correlate with other metrics. For instance, when correlating with complexity measures, an instruction count LOC measurement might be most appropriate, while when correlating with defect rates, you might want to compare to either commenting or unit test ratios.
But like I said above, all uses of metrics require some level of interpretation by someone who knows the codebase and can see the relationships between them. I think this is true not just of LOC-based measurements, but of all code metrics.
As a final note, here are what I think are some of the keys to successfully using codebase metrics:
- Keep all metrics anonymous — do not allow them to be filtered, sorted, or otherwise associated with individual developers. If developers feel threatened by metrics they will either game them or sabotage the effort.
- Always publish definitions of each metric you use so everyone knows exactly how it is measured.
- Always measure the same metric the same way over time. You may end up introducing metrics with slight variations in meaning over time, but that’s ok.
- Do not use metrics to automatically trigger any action other than performing further investigation.
- Beware of “best practice” threshold values. Always calibrate to your domain, team, and product.
- Make your system explorable. Developers know how to query databases, so give them full access to the raw data and see what interesting information might be mined.