On static analysis
Monday, August 31, 2009
Abstract of my study for the AGU
Here's the current draft of the abstract. I've found it a little tricky to write an abstract for work that I haven't yet completed but I've given it a go. I've gotten some excellent feedback from some of my colleagues (big up to: Steve, Neil, Jono, and Jorge) as to how to frame the problem and my "results" (in quotations because I don't yet have concrete results).
Feedback on clarity, wording, grammar, framing of the problem and results, etc... are very much welcome.On the software quality of climate models
A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities -- an established software engineering technique for studying software quality -- of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.
Workshops at PowerShift Canada
Monday, August 17, 2009
- A climate science backgrounder
- Climate modelling 101
- An insider's perspective on the IPCC
- Communicating the science of climate change
- Developing Canada's GHG inventory
Counting lines of code
| Lines | Type | Physical Line? | Logical Line? |
| !! this is a comment | comm | no | no |
| blank | no | no | |
| #if defined foo | exec | yes | yes |
| #ifdef key_squares | exec | yes | yes |
| #include "SetNumberofcells.h" | comp | yes | yes |
| #else | exec | yes | no |
| #endif | exec | yes | yes |
| SUBROUTINE A(Sqr_Grid) | decl | yes | yes |
| USE Sqr_Type | exec | yes | no |
| IMPLICIT NONE | decl | yes | yes |
| IF (assoc(cur_grid)) THEN | exec | yes | yes |
| Type(grid), Pointer :: Sqr_Grid | decl | yes | yes |
| WRITE(*,*) & | exec | yes | no |
| 'Hello' | exec | yes | yes |
| ENDIF | exec | yes | yes |
| END SUBROUTINE A | data | yes | no |
The physical line count is just a count of non-blank, non-comment lines. The logical line count tries to be a bit smart by counting lines in more abstract terms (I imagine a philosopher-computer scientist in some windowed office somewhere chin-stroking and asking, "What is a line of code?"). Anyhow, CodeCount computes logical line count by ignoring lines with continuation characters (e.g. "&") and certain other statements (e.g. "USE", "CASE", "END IF", "ELSE") and by counting each statement in a multi-statement line as a separate line. The full specification is in the CodeCount source if you're interested.
Some results from Forcheck
Friday, July 17, 2009
2x[ 84 I] no path to this statement
1564x[344 I] implicit conversion of constant (expression) to higher accuracy
635x[681 I] not used
265x[675 I] named constant not used
144x[323 I] variable unreferenced
144x[699 I] implicit conversion of real or complex to integer
125x[ 94 E] syntax error
108x[109 I] lexical token contains non-significant blank(s)
107x[319 W] not locally allocated, specify SAVE in the module to retain data
96x[557 I] dummy argument not used
78x[345 I] implicit conversion to less accurate data type
65x[316 W] not locally defined, specify SAVE in the module to retain data
65x[665 I] eq.or ineq. comparison of floating point data with constant
38x[313 I] possibly no value assigned to this variable
35x[342 I] eq.or ineq. comparison of floating point data with zero constant
34x[644 I] none of the entities, imported from the module, is used
27x[124 I] statement label unreferenced
27x[315 I] redefined before referenced
27x[341 I] eq. or ineq. comparison of floating point data with integer
22x[125 I] format statement unreferenced
21x[ 1 I] (MESSAGE LIMIT REACHED FOR THIS STATEMENT OR ARGUMENT LIST)
21x[514 E] subroutine/function conflict
19x[530 W] possible recursive reference
18x[674 I] procedure, program unit, or entry not referenced
18x[598 E] actual array or character variable shorter than dummy
10x[340 I] equality or inequality comparison of floating point data
8x[325 I] input variable unreferenced
7x[347 I] non-optimal explicit type conversion
7x[565 E] number of arguments inconsistent with specification
6x[582 E] data-type length inconsistent with specification
6x[668 I] possibly undefined: dummy argument not in entry argument list
5x[312 E] no value assigned to this variable
5x[691 I] data-type length inconsistent with specification
4x[556 I] argument unreferenced in statement function
4x[621 I] input/output dummy argument (possibly) not (re)defined
4x[384 I] truncation of character variable (expression)
3x[383 I] truncation of character constant (expression)
3x[343 I] implicit conversion of complex to scalar
3x[454 I] possible recursive I/O attempt
3x[570 E] type inconsistent with specification
2x[568 E] type inconsistent with first occurrence
2x[573 E] data type inconsistent with specification
2x[ 84 I] no path to this statement
2x[651 I] already imported from module
2x[617 I] conditionally referenced argument is not defined
2x[214 E] not saved
2x[236 E] storage allocation conflict due to multiple equivalences
2x[700 E] object undefined
2x[307 E] variable not defined
1x[115 E] multiple definition of statement label, this one ignored
1x[145 I] implicit conversion of scalar to complex
1x[228 W] size of common block inconsistent with first declaration
1x[230 I] list of objects in named COMMON inconsistent with first declaration
1x[250 I] when referencing modules implicit typing is potentially risky
1x[667 E] undefined: dummy argument not in entry argument list
1x[676 I] none of the objects of the common block is used
1x[616 E] input or input/output argument is not defined
number of error messages: 200
number of warnings: 192
number of informative messages: 3415
frac1=+1.
ccc !!! ground properties should be passed as formal parameters !!!k_ground is used later in the function, but c_ground never is.
k_ground = 3.4d0 !/* W K-1 m */ /* --??? */
c_ground = 1.d5 !/* J m-3 K-1 */ /* -- ??? */
QUESDEF.f: prather_limits = 0.I, of course, have no idea whether these cases are unintentional, or what the effects of these casts are. I would think that it is generally dangerous to cast unintentionally to a less precise number...
OCNFUNTAB.f: JS=SS
RADIATION.f: JMO=1+JJDAYS/30.5D0
In a fixed format source form blanks are not significant. However, a blank in a name, literal constant, operator, or keyword might indicate a syntax error.Okay, I still don't get it.. but that's because I'm at all familiar with Fortran (yet). Here's one example from RADIATION.f:
DATA PRS1/ 1.013D 03,9.040D 02,8.050D 02,7.150D 02,6.330D 02,Each value, "1.013D 03" for example, is flagged as an instance of this issue. In this particular case these are just ways of writing down Double-precision numbers, but normally (I guess?) you wouldn't see the space, but instead a + or - sign. There might be a forcheck compiler option I can flip that will ignore this particular type of issue. I tried looking for other, possibly more significant instances of this issue. I found many that were "just" because of line continuations. That is, a line was intentionally wrapped and so this appeared as a space at the end of the line and the start of the next. Here are a two examples:
1 5.590D 02,4.920D 02,4.320D 02,3.780D 02,3.290D 02,2.860D 02,
2 2.470D 02,2.130D 02,1.820D 02,1.560D 02,1.320D 02,1.110D 02,
3 9.370D 01,7.890D 01,6.660D 01,5.650D 01,4.800D 01,4.090D 01,
4 3.500D 01,3.000D 01,2.570D 01,1.220D 01,6.000D 00,3.050D 00,
5 1.590D 00,8.540D-01,5.790D-02,3.000D-04/
SEAICE.f: IF (ROICE.gt.0. and. MSI2.lt.AC2OIM) thenI had no idea that fortran has such awful syntax for logical expressions. Yes, you read it correctly, the greater-than operator is .gt. and so on. In any case, in the above example in SEAICE.f, the fact that the and-operator is written as ". and." rather than ".and." is what raises this issue. In the ODIAG_PRT.f file, it is the fact that there is a space in the representation of the double-precision number.
ODIAG_PRT.f: SCALEO(LN_MFLX) = 1.D- 6 / DTS
Climate wars
Monday, July 6, 2009
Static analysis software for Fortran
Friday, June 26, 2009
- ftncheck. A free, static analyser for Fortran 77 programs only, though there is an effort underway to update ftnchek for Fortran 90.
- Forcheck. Not free. Apparently the oldest and most comprehensive static analysis tool. Can handle up to Fortran 95 and some Fortran 2003. There is a trial offered.
- Cleanscape FortranLint. Not free. Can handle up to Fortran 95. Interestingly, there is a testimonial for it from NCAR. A trial version is also offered.
- PlusFORT. Not free. The folks that make this, Polyhedron Software, also distribute Forcheck.
- Understand 2.0. This isn't in the same class as the others, as it only computes code metrics. It also does some nifty code visualisations. A trial version is available. I fed it the ModelE code months ago, and it seemed to handle it without trouble.
- QA-FORTRAN from Programming Research Ltd. This is the software Les Hatton used in his T Experiments paper.
- Forwarn, from Quibus.
- FOR_STUDY from Cobalt-Blue.
Note: these are all separate analysis tools. I didn't look at the analysis that the various Fortran compilers can do. Polyhedron Software has a very detailed looking comparison chart.
tl;dr: Forcheck, Cleanscape FortranLint.
