Jon Pipitone

Some results from Forcheck

Friday, July 17, 2009

In this post I'll describe some of what I've found by using Forcheck to analyse climate modelling code. I am currently evaluating it and Fortranlint as tools for the static analysis portion of my study. tl;dr: forcheck took some time to configure, but it was configurable in every way I've needed it to be (with a few minor exceptions) and the results seem to be exactly like what I was hoping for.

I decided to start with analysing NASA modelE climate model because I could easily understand the build/configuration system and navigation my way around code easily enough. Some of the other models I have the source to seem a bit trickier. The analysis I'm about discuss isn't on the entire model source code, but only for one particular configuration of modules (an ocean-atmosphere coupled configuration though, so it includes many of the source modules).

In any case, I'll give you the goods upfront. I'll show you the big long summary of problems found forcheck found, but first let me explain the format. Here is an example item:

    2x[ 84 I] no path to this statement

The "2x" means that this issue was found two times. 84 is the unique issue identifier that can be used to look up the issue in the forcheck documentation. I means that this issue is an informative type issue, as opposed to a W warning, or E error. The rest of the line contains a short description of the issue. Got it? Good.

Here's the big list:

1564x[344 I] implicit conversion of constant (expression) to higher accuracy
635x[681 I] not used
265x[675 I] named constant not used
144x[323 I] variable unreferenced
144x[699 I] implicit conversion of real or complex to integer
125x[ 94 E] syntax error
108x[109 I] lexical token contains non-significant blank(s)
107x[319 W] not locally allocated, specify SAVE in the module to retain data
96x[557 I] dummy argument not used
78x[345 I] implicit conversion to less accurate data type
65x[316 W] not locally defined, specify SAVE in the module to retain data
65x[665 I] eq.or ineq. comparison of floating point data with constant
38x[313 I] possibly no value assigned to this variable
35x[342 I] eq.or ineq. comparison of floating point data with zero constant
34x[644 I] none of the entities, imported from the module, is used
27x[124 I] statement label unreferenced
27x[315 I] redefined before referenced
27x[341 I] eq. or ineq. comparison of floating point data with integer
22x[125 I] format statement unreferenced
21x[  1 I] (MESSAGE LIMIT REACHED FOR THIS STATEMENT OR ARGUMENT LIST)
21x[514 E] subroutine/function conflict
19x[530 W] possible recursive reference
18x[674 I] procedure, program unit, or entry not referenced
18x[598 E] actual array or character variable shorter than dummy
10x[340 I] equality or inequality comparison of floating point data
8x[325 I] input variable unreferenced
7x[347 I] non-optimal explicit type conversion
7x[565 E] number of arguments inconsistent with specification
6x[582 E] data-type length inconsistent with specification
6x[668 I] possibly undefined: dummy argument not in entry argument list
5x[312 E] no value assigned to this variable
5x[691 I] data-type length inconsistent with specification
4x[556 I] argument unreferenced in statement function
4x[621 I] input/output dummy argument (possibly) not (re)defined
4x[384 I] truncation of character variable (expression)
3x[383 I] truncation of character constant (expression)
3x[343 I] implicit conversion of complex to scalar
3x[454 I] possible recursive I/O attempt
3x[570 E] type inconsistent with specification
2x[568 E] type inconsistent with first occurrence
2x[573 E] data type inconsistent with specification
2x[ 84 I] no path to this statement
2x[651 I] already imported from module
2x[617 I] conditionally referenced argument is not defined
2x[214 E] not saved
2x[236 E] storage allocation conflict due to multiple equivalences
2x[700 E] object undefined
2x[307 E] variable not defined
1x[115 E] multiple definition of statement label, this one ignored
1x[145 I] implicit conversion of scalar to complex
1x[228 W] size of common block inconsistent with first declaration
1x[230 I] list of objects in named COMMON inconsistent with first declaration
1x[250 I] when referencing modules implicit typing is potentially risky
1x[667 E] undefined: dummy argument not in entry argument list
1x[676 I] none of the objects of the common block is used
1x[616 E] input or input/output argument is not defined

number of error messages:           200
number of warnings:                 192
number of informative messages:    3415

I've only taken a peek at at a few of these issues in detail. Some of them are nonsense and can be disregarded right off the bat. For instance, the 125 syntax errors? Well, most of them come from the fact that the source files contain the cpp macros __FILE__ or __LINE__ and I haven't figured out yet how to make forcheck expand them (or I haven't worked out the ModelE Makefile magic to get cpp to do it instead).

Looking at the most frequent issues now. The most frequent is casting constant to a higher accuracy. Here's an example from the file QUESDEF.f:

frac1=+1.

where frac1 is defined as a REAL*8. I wouldn't think that casting unknowingly to a higher accuracy would pose much of a problem... but what do I know. Can anyone think of some examples where it would be?

The next most frequent issue is the "not used" issue. The issue here is that a variable has been declared but then never gets used before it goes out of scope. There are two examples of this in the file OCNFUNTAB.f, a module with lookup table functions. From my cursory look, it seems that many of the functions are similarly structured: both in purpose and in terms of documentation and layout. My guess is that in this specific case this evolved from copying an existing function to use as a template for new one, and forgetting to remove the unused declared variable. A code clone. This isn't always the case, of course. In another file with this issue, SNOW.f, it appears to be the result of commenting out code.

This issue is distinguished from the one that is two below it on the list, "variable unreferenced". The variable unreferenced issue refers specifically to when a variable is declared, and a value is set, but the variable is never accessed. This issue also occurs in SNOW.f, in this curious example:

ccc !!! ground properties should be passed as formal parameters !!!
k_ground =        3.4d0    !/* W K-1 m */    /* --??? */
c_ground =        1.d5     !/* J m-3 K-1 */  /* -- ??? */

k_ground is used later in the function, but c_ground never is.

Next up is the "implicit conversion of a real or complex to integer". Here'r a few examples:

QUESDEF.f:     prather_limits = 0.
OCNFUNTAB.f:   JS=SS
RADIATION.f:   JMO=1+JJDAYS/30.5D0

I, of course, have no idea whether these cases are unintentional, or what the effects of these casts are. I would think that it is generally dangerous to cast unintentionally to a less precise number...

Let's look at one last issue, the cryptic (to me), "lexical token contains non-significant blank(s)". Forcheck describes by saying:

In a fixed format source form blanks are not significant. However, a blank in a name, literal constant, operator, or keyword might indicate a syntax error.

Okay, I still don't get it.. but that's because I'm at all familiar with Fortran (yet). Here's one example from RADIATION.f:

     DATA PRS1/      1.013D 03,9.040D 02,8.050D 02,7.150D 02,6.330D 02,
1      5.590D 02,4.920D 02,4.320D 02,3.780D 02,3.290D 02,2.860D 02,
2      2.470D 02,2.130D 02,1.820D 02,1.560D 02,1.320D 02,1.110D 02,
3      9.370D 01,7.890D 01,6.660D 01,5.650D 01,4.800D 01,4.090D 01,
4      3.500D 01,3.000D 01,2.570D 01,1.220D 01,6.000D 00,3.050D 00,
5      1.590D 00,8.540D-01,5.790D-02,3.000D-04/

Each value, "1.013D 03" for example, is flagged as an instance of this issue. In this particular case these are just ways of writing down Double-precision numbers, but normally (I guess?) you wouldn't see the space, but instead a + or - sign. There might be a forcheck compiler option I can flip that will ignore this particular type of issue. I tried looking for other, possibly more significant instances of this issue. I found many that were "just" because of line continuations. That is, a line was intentionally wrapped and so this appeared as a space at the end of the line and the start of the next. Here are a two examples:

    SEAICE.f:   IF (ROICE.gt.0. and. MSI2.lt.AC2OIM) then
ODIAG_PRT.f:   SCALEO(LN_MFLX) = 1.D- 6 / DTS

I had no idea that fortran has such awful syntax for logical expressions. Yes, you read it correctly, the greater-than operator is .gt. and so on. In any case, in the above example in SEAICE.f, the fact that the and-operator is written as ". and." rather than ".and." is what raises this issue. In the ODIAG_PRT.f file, it is the fact that there is a space in the representation of the double-precision number.

This concludes my look at at the forcheck results for now. I should note a few things about how I configured forcheck to actually analyse the results. The most important thing to know is that forcheck can be configured to analyse the syntax according to the quirks of various commercial compilers. For this example I chose to use the Absoft Pro Fortran 90/95 V9 compiler emulation mode because the documentation for ModelE suggested this compiler works well. I had to slightly customise the compiler configuration to enable cpp preprocessing (off by default). I also had to configure forcheck with two cpp "defines" that specified the compiler and target architecture because there were a few places in the source code that had conditional compilation rules.

As mentioned, I didn't analyse the complete source code of the model. Not every module is used for every run configuration. I simply looked at one of the run configuration files that comes with the model, and configured forcheck to analyse only those modules that were specified there. For example, there is a module each for a variety of different resolution configuration: RES_M53.f, RES_M24T.f, etc., as well as several different versions of, what looks like, physics modules: CLOUDS.f and CLOUDS2.f for instance. The run configuration specifics only one resolution module and only one version of the clouds module and so that's what I analysed.

I point all of this out only to be explicit about the limitations of what I'm reporting here. This is just one slice through the code and, as you would expect, the static analysis report is often misleading. Nevertheless, I think this will make for some interesting starting points for discussion about code quality issues.

Climate wars

Monday, July 6, 2009

Climate Wars is a three-part feature on CBC's Ideas radio show about climate change, how we are and are not dealing with it, and the potential for social and political breakdown because of that. For those of you interested in climate modelling, there is lots of fun stuff about it in here too.

The first part in the series was replayed this evening and I was reminded about how good this documentary is. 'Good' might not be the best word here. How about totally fucking scary? Yes, that.

If you have not already heard it, I suggest you listen it. I urge you to listen to it. It is that kind of documentary. It is three hours in total so it is quite a commitment, but I think after listening to the first part you'll find the time.

You will not come out happy, and you will likely be uncomfortable whilst listening to it. That's a very very good thing. But not something to try to ignore, push aside, or minimise. If, after listening to it, you'd like someone to chat with about it, come find me.

CBC content:

Listen to Part 1 of Climate Wars

Listen to Part 2 of Climate Wars

Listen to Part 3 of Climate Wars

Click on the player to listen.

Direct links to mp3s: Part 1, Part 2, Part 3.

Static analysis software for Fortran

Friday, June 26, 2009

I started with the list of Fortran tools on fortran.com:

ftncheck. A free, static analyser for Fortran 77 programs only, though there is an effort underway to update ftnchek for Fortran 90.
Forcheck. Not free. Apparently the oldest and most comprehensive static analysis tool. Can handle up to Fortran 95 and some Fortran 2003. There is a trial offered.
Cleanscape FortranLint. Not free. Can handle up to Fortran 95. Interestingly, there is a testimonial for it from NCAR. A trial version is also offered.
PlusFORT. Not free. The folks that make this, Polyhedron Software, also distribute Forcheck.
Understand 2.0. This isn't in the same class as the others, as it only computes code metrics. It also does some nifty code visualisations. A trial version is available. I fed it the ModelE code months ago, and it seemed to handle it without trouble.

There are also many projects referenced that are now defunct as far as I can tell:

QA-FORTRAN from Programming Research Ltd. This is the software Les Hatton used in his T Experiments paper.
Forwarn, from Quibus.
FOR_STUDY from Cobalt-Blue.

I also came across a 1999 technical report from Council for the Central Laboratory of the Research Councils (now Science and Technology Facilities Council) that summarises the tools available then and lists their static analysis features. Much of this is still useful.

Note: these are all separate analysis tools. I didn't look at the analysis that the various Fortran compilers can do. Polyhedron Software has a very detailed looking comparison chart.

tl;dr: Forcheck, Cleanscape FortranLint.

Currently reading

Friday, June 12, 2009

I haven't posted paper summaries in a while but here's a selection of what I've been reading:

On software quality:

On scientific software:

On empirical studies:

One-page summary of my research plan

Thursday, June 11, 2009

I'm exploring the software quality of climate modelling software. I'm investigating what software quality means for climate modellers, and how we can go about measuring quality and benchmarking it. For example, some of the broader questions that motivate me are: How can we determine the software quality of a climate model? How can we compare the quality of one model to another? How can we compare software quality to commercial products or to other computational science software applications? What do the climate modellers themselves mean when they attempt to build good quality software?

These are big questions. To start to answer them I am going to do two small things. Firstly, I am going to inspect the climate modelling software itself. I will use fault density (i.e. statically identifiable errors and "misuses" of the programming language) as well as bug density (i.e. reported and fixed defects) to benchmark the software quality of several climate models. This analysis will carefully consider these statistics across the various defect dimensions (for example, pre- and post-release and defect type). See my blog post on counting defects for more details.

I believe the more interesting questions about how climate modellers view software quality cannot be answered through their code or bug reports alone. So, I'd also like to interview some of the climate scientists that have built the software. Specifically, I'd like to ask them for the stories behind a selection of defects that they found and fixed during development or after a release. The intention here is that by asking questions about the circumstances of a defect and about the judgements made to find and solve it I may start to piece together an understanding of their specific notion of software quality. See my blog post that describes this part of the study in more detail, and my blog post that describes some of the questions I can ask about defects.

Thoughts? Comments? Questions?

The shape of the playing field

Wednesday, June 10, 2009

Ever since my first pitch of the defect density study I've been trying to work out what the bigger research questions are here. I'm not really content with just collecting the defect density results unless I can see how the results fit into a larger story.

My first instinct was to find out more about defects themselves and to see what other people have done with them in their studies. What is the relationship between defect densities and software quality? How do other people understand software quality and measure it? What can we really learn about software quality from looking at defect densities? And to what use can I put these results once I have them? I'm starting to get a picture of defect densities and their usefulness, and it is not nearly as good of a tool as I had thought, but it is still worth evaluating.

The title of my talk last week was one possible framing of a much bigger question: why do climate modellers trust the code they write? As in Daniel Hook's presentation at SE-CSE '09, trustworthiness seems like an appropriate way to frame the discussion about software quality when it comes to climate models. Why? Because, coarsely, in computational science pursuits like climate modelling there are not always hard and fast rules to distinguish correct and incorrect results. As Hook says, there are no perfect oracles to which results can be checked against (the oracle problem), and that even if oracles existed the approximations and measurement errors inherent in modelling can make it tricky to distinguish any introduced error coming from faulty code (this is the tolerance problem).

So, how then do the climate modellers know if they're on the right track when constructing their models? We know they employ a wide suite of sophisticated tests to tease out flaws in the conceptual model (validation) and errors in their implementation of that model (verification). My understanding is that underlying some of the validation work are judgement calls, gut checks, and tacit heuristics used to distinguish whether a model is doing the right thing. For example, climate modellers might ask of a model output, "is it raining where it ought to be raining?" The answer to this question isn't well-defined, but it can be answered with a lot of background knowledge and familiarity with the climate processes. This is partly the oracle problem at play. The model output is the result of a scientific experiment and not something we could hope to give a complete description of before hand. I'm not saying validation is all guesswork -- not even close -- but just that there are unformalisable elements to model validation that, I don't think, we're used to thinking about when we discuss traditional software testing. We are used to thinking about software as having more explicit and testable requirements[1].

On the verification side, the tolerance problem entails that, even if we ignore the conceptual problems with the model, it is still not a straightforward matter to be certain if the code is correct. Uncertainties in the data, truncation error in approximations, and round-off error in computations can all hide real errors resulting from flaws in the model implementation.

Asking why climate modellers trust the code they write is one way of trying to understand what climate modellers are doing when they attempt to write good quality code. Given that they have such radically different notion of requirements and correctness, how is their notion of software quality different? If you can't always write unit against the bulk of your work, and you can't always explicitly write down rules for correctness, what then do you mean by good code? I think it's important to start with these questions because the answers inform other questions about the usefulness of defect densities and of quality benchmarking. With a firmer idea of what quality actually is for climate modellers, we can then work on how best to measure it or benchmark it.

To summarise, the primary question is:

What does software quality mean for climate modellers?

The software quality folks have come up with an impressive list of attributes of software quality known is the "ilities". Maybe a more specific version of the above question asks about which quality attributes are most important for climate modellers.

I think there are two other companion questions that need to be asked:

How do climate modellers judge a piece of code against these quality attributes?

What practices do climate modellers follow to achieve high quality software (in terms of the identified quality attributes)?

If software quality was a game of football, the first question asks about the shape of the field and the rules of the game, the second asks about where the goal posts are, and the third asks about the playbook. Ahem.

So, how I go about answering these questions?

I could ask the climate modellers directly. This assumes that they know the answers explicitly. I'm not sure I could answer the same questions for myself.

I could also look at defects. I've defined a defect before as "something worth fixing". Can we say this means a defect is part of the software, created or omitted, that indicates a lack of satisfaction of the important quality attributes. If so, then looking carefully a defect and its circumstances, and in particular asking the climate modellers questions about reported defects might provide some of the basis for answering the above three questions. Or at least the basis from which to ask more intelligent questions.

That is, investigating why and when a piece of climate modelling software falls short might be the very place to look for exposed notions of quality, quality goals, and the practises used to manage them.

Would interviewing scientists about defects give the complete story? Certainly not. For at least these reasons:

I'd only be able to consider a sampling of the defects, and only interview a sampling of the modellers the defects related to.
As noted in earlier posts, some defects may go unreported. Put another way, the selection of reported defects depends on the type of testing that is done, and not necessarily on the nature of the defect itself. That is, defects are not found if no one goes looking for them.
Refining that point a bit: the defects that are found may only be associated with the subset of the quality attributes that are the least well managed. That is, software may show fewer defects related to quality attributes for which there is a well-functioning process in place. These attributes would not appear to be as well represented and thus may not seem important when, in fact, they are.

[1] I feel pretty strange talking with such authority. Please jump in if you know better.

A framework for counting problems and defects

Monday, June 8, 2009

Last week I came across this technical report from SEI:

Software Quality Measurement: A Framework for Counting Problems and Defects

Abstract. This report presents mechanisms for describing and specifying two software measures–software problems and defects–used to understand and predict software product quality and software process efficacy. We propose a framework that integrates and gives structure to the discovery, reporting, and measurement of software problems and defects found by the primary problem and defect finding activities....

I haven't yet read through the report thoroughly, though the bits I have read seem immensely sensible. This report doesn't attempt anything too grand. It simply lays out clear definitions, and provides a set of questions to ask yourself when going about trying to understand and count problems and defects. Cool.

In fact, I think these questions will also be great to use when following up with climate modellers about specific bugs. Here they are:

Identification: What software product or software work product is involved?
Finding Activity: What activity discovered the problem or defect?
Finding Mode: How was the problem or defect found?
Criticality: How critical or severe is the problem or defect?
Problem Status: What work needs to be done to dispose of the problem?
Problem Type: What is the nature of the problem? If a defect, what kind?
Uniqueness: What is the similarity to previous problems or defects?
Urgency: What urgency or priority has been assigned?
Environment: Where was the problem discovered?
Timing: When was the problem reported? When was it discovered? When was it corrected?
Originator: Who reported the problem?
Defects Found In: What software artifacts caused or contain the defect?
Changes Made To: What software artifacts were changed to correct the defect?
Related Changes: What are the prerequisite changes?
Projected Availability: When are changes expected?
Released/Shipped: What configuration level contains the changes?
Applied: When was the change made to the baseline configuration?
Approved By: Who approved the resolution of the problem?
Accepted By: Who accepted the problem resolution?

I might add more why and how questions to this list: why did the bug go unnoticed? why is it important to have fixed this bug, at that time? how was the bug fixed? why is the fix appropriate?