Counting lines of code

Monday, August 17, 2009

I've been using the CodeCount tool to count lines of Fortran code. Here'r some of the gruesome details of what that entails -- for posterity's sake.

In part of my study I'm measuring defect densities of various climate models. Defect density is the number of defects divided by size of the project measured in lines of code (and most often per 1000 lines of code). Thus, I need to be able to count lines of code. Fortran. Often mixed versions. In this blog post I'll describe one of the limitations I've come across in using the CodeCount tool.

The following table summarises the default behaviour of the CodeCount tool on a snippet of Fortran. The Lines column contains the lines of the Fortran and preprocessor code being analysed. Note, this isn't a working piece of code in any way but that doesn't matter to the CodeCount tool. It's just a collection of lines I used to test the tools behaviour. Anyhow, the Type column specifies how CodeCount categorised the line: comment (comm), blank line (blank), executable (exec), data declaration (decl), or compiler directive (comp). The Physical Line and Logical Line columns specify whether CodeCount counts these lines towards the physical and logical line counts, respectively.

LinesTypePhysical Line?Logical Line?
!! this is a commentcommnono

blanknono
#if defined fooexecyesyes
#ifdef key_squaresexecyesyes
#include "SetNumberofcells.h"compyesyes
#elseexecyesno
#endifexecyesyes
SUBROUTINE A(Sqr_Grid)declyesyes
USE Sqr_Typeexecyesno
IMPLICIT NONEdeclyesyes
IF (assoc(cur_grid)) THENexecyesyes
Type(grid), Pointer :: Sqr_Grid
declyesyes
WRITE(*,*) &execyesno
'Hello'execyesyes
ENDIFexecyesyes
END SUBROUTINE Adatayesno

The physical line count is just a count of non-blank, non-comment lines. The logical line count tries to be a bit smart by counting lines in more abstract terms (I imagine a philosopher-computer scientist in some windowed office somewhere chin-stroking and asking, "What is a line of code?"). Anyhow, CodeCount computes logical line count by ignoring lines with continuation characters (e.g. "&") and certain other statements (e.g. "USE", "CASE", "END IF", "ELSE") and by counting each statement in a multi-statement line as a separate line. The full specification is in the CodeCount source if you're interested.

So the question I could ask is: do I use the logical or physical line count? It's a small question but, oh, I went there. The logical line count is appealing in that it seems likely to be more robust across different coding styles, and maybe gets more at the essence of what the size of a program is (whatever that means; see chin-stroking philosopher above for more information).

Unfortunately the CodeCount tool is too smart (or too stupid) in the way that it counts logical lines. It doesn't gracefully handle pre-processor statements or certain Fortran dialects. This you can see from the table above in the two places I've highlighted in red.

As far as I can make out, as long as a line contains only "ELSE" (other than non-word characters) CodeCount counts this line only as a physical line, not a logical line. So, it counts preprocessor lines as logical lines, except in the case of "#else", which it ignores. Should preprocessor lines be counted as lines of code? I don't know, maybe. Probably, in fact. If so, then we should count all of them as logical lines. Unfortunately, from the bit of digging I've done I can't see how to get CodeCount to consider "#else" as a logical line without messing with the code. No thanks.

But, alas, there's more. CodeCount counts an "ENDIF" as a logical line as you can see, but I don't think it should. See, as mentioned, it's built so that it does not count an "END IF" as a logical line. Now, I'm totally new to Fortran but most references I've come across close an IF block with an END IF, but I've seen one or two references to closing an IF block with an ENDIF. And in fact, some of the code I'm analysing uses exactly that syntax. So, CodeCount will have a slightly inflated logical line count if I use it for these source files.

Again, to fix this problem I'd have to resort to hacking the source if I want CodeCount. And, since I'm so new to Fortran I don't even know the extent to which there are differences in the various dialects so even if I were to decide hacking the source was a good idea, I wouldn't ever be sure I'd fixed it completely. (For instance, I just found out there are also "ENDDO" statements, not "END DO", statements in one of my sources!)

In short: I've been sticking to using physical line counts.

2 comments:

Carolyn said...

You've thought about this more than I have - I was just using the physical, non-blank lines of the file.

jon said...

Cool. what tool are you using to count lines?

Anyhow, it's a minor question, sure, but I thought I'd blog about it so that others could see (and to get in the habit of blogging again, more on this later).

Lines of code = a very strange metric.

Post a Comment