Counting Code

The notion of counting lines of code is often dismissed as unhelpful in telling us anything substantive about the code. I think there are a whole bunch of reasons behind this attitude (fodder for other posts), but one of the main practical problems with counting lines of code is this: exactly what are you counting? In some ways, as long as you always count the same thing it doesn’t really matter. But that’s not a very viscerally satisfy approach.

Raw lines of code — equivalent to counting the number of newline characters in a file — is the easiest and most common method used. Unfortunately, this count contains a lot of noise in the form of whitespace, individual formatting preferences, syntactic junk dictated by the language, etc. It also includes comments and it seems there is a lot of disagreement about whether comments are noise or signal.

I consider comments to be more signal than noise. Bad or inaccurate comments can lean toward the noise end of the spectrum, but I think on balance most comments lean toward the signal side of things. If nothing else, comments indicate that the writer of the code was concerned with documenting what they wrote.

I implemented a code metrics program at my current shop. We gather a lot of different values, including three different flavors of lines-of-code:

  1. Raw Lines of Code. This is used as a baseline. It is a reasonable measure of the overall volume of the code, which can correlate (both positively and negatively) how much effort went into creating it.
  2. Semantic Lines of Code. This count excludes many sources of noise and I think captures the intellectual content of the code. (More details on how this is counted below.)
  3. Lines of Comments. This reflects the amount of effort that was put into describing what the code does and its API and is derived from the semantic lines of code.

It’s important to note that none of these values in isolation is as interesting as they are when taken together. For instance the lines of comments don’t tell us a whole lot, but the ratio of lines of comments to semantic lines of code does. The numeric values of the numbers don’t necessarily have much meaning either. A function that is 20 semantic lines of code is neither good or bad, it’s just 20 semantic lines of code. The number only has meaning when compared to itself: another function has 100 lines of code, thus it is clearly “bigger” in some meaningful (but not precise) way. As long as you don’t inject too much speculation into these numbers, they can be used to help understand your code base.

Semantic Lines of Code

To calculate the semantic lines of code:

  • Discard empty lines
  • Discard lines that contain only syntactic markers like { and }
  • Discard lines with only using or include statements, or other glue statements
  • Discard comment lines that are used solely for separation or syntactic marking
    • Discard XML comment start and ends markers such as <summary>
    • Discard horizontal ruler comments

With this scheme, code like this:

#using System.IO;

//-----------------------------------------------
/// <summary>
/// This is a totally pointless function
/// </summary>
public void Hello(string str)
{
    Console.Writeline("Hello {0}", str);
}

would be counted as if it were this:

/// This is a totally pointless function
public void Hello(string str)
    Console.Writeline("Hello {0}", str);

Again, what we’re looking at here is a measure of overall size, not complexity, maintainability, etc. Code size might contribute to our understanding of these things, but not in isolation.  We need other metrics to round out our picture of the code.

Advertisements

About jeffkotula

Software engineer, musician, and so on...
This entry was posted in Coding, Programming languages, Software and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s