Jon Pipitone

Modeling the solutions to climate change

Tuesday, October 20, 2009

For the past couple of weeks a few of us in the software engineering group have been meeting to take up Steve's modeling challenge: we are attempting to model (visually, not computationally) the proposed solutions from several popular books. The idea is to do so so that it's possible (easy?) to compare the differences and similarities between them. Here is the homepage* for the project, which roughly tracks what we're up to. I'm going to summarise our progress so far.

To start off, we narrowed our focus down to just comparing the books by their take on wind power solutions. We began with David McKay's excellent book, Sustainable Energy -- without the hot air.

In our first few meeting we decided to just "shoot first and ask questions later". That is to say, we just collaborative built up a model of the chapters on wind power as we saw fit in the moment, without following any visual syntax and without worrying too much about what to include or what to ignore. The result looked like this:

At the bottom of that picture is our brainstorming about what other aspects to include (the left hand column), the types of perspectives/analysis that McKay uses and that may be useful to include a future exercise (middle column), and the types of differences we expect to see when comparing models (right column).

The next step would have been to come up with the same sort of model for another book, and then start to figure out how best to make the models comparable so that it is visually easy to see the differences and similarities between the various models.

We didn't do that. Instead, we decided to try making a more principled model. Actually, set of models. We decided to construct an entity-relationship (ER) model, and a goal model (i*) for two books and then see about how to go about making those models comparable.

We began with the entity-relationship model. Again, for McKay's book. McKay's book is fairly well segmented into chapters that have back-of-the-envelope-style analysis and others that have a more broad discussion of the actors and issues. In our first attempt shown above, we mainly only modeled the two chapters on wind-power analysis. But if we just stuck to those chapters for the ER and goal models we'd be left with very impoverished models that miss all of the important contextual bits that frame the wind-power discussion. We relaxed the restriction on our wind-power focus slightly so as to include parts of the book that discuss the context. In the case of McKay's book, chapter one covers this nicely.

After our first few meetings we've completed the ER domain model, as well as made a good start on the goal model.

For the wider context (chapter one), we built the following ER model:

This model is a bit of a monster, but I'm told that most models are like that. Other than the standard UML relationship syntax, we have coloured the nodes to represent whether the concept comes from the book directly (blue), or whether we included it because we felt it was implied or simply helpful for clarity (yellow).

Using the same process we created the following ER model for just the two chapters on wind:

As well, we've begun to go back over the first chapter and build up an i* goal model. Here it is so far:

Stay tuned for further updates on what we're up to. I'd suggest that at the moment these models should simply be taken as our first hack. We haven't done any work whatsoever to make them very readable or comparable, for instance.

* I feel like "homepage" is a rather outdated word now. Is that so?

Geoscientific Model Development

Sunday, October 18, 2009

I had an wonderful chat last week with Stephen Griffies from GFDL. It was a fascinating interview that I'll have to blog about over several posts because we just covered so much territory.

One especially interesting pointer Stephen gave me was to a new journal from European Geosciences Union titled Geoscientific Model Development. This is a journal that accepts articles about the nuts and bolts of building modelling software. It is apparently the only journal like it. Most of the other journals that climate scientists publish in will only accept papers on the "science" derived from the use of such models.

For those of us interested in how climate models are developed, this journal will likely be very relevant. What I find particular cool is the transparent peer-review process and open-discussion. This means for a particular article (say, this one on coupling software for earth-system modelling), you can read the paper and the current referee reviews (with the option to submit your own comments).

One issue with the journal Stephen mentioned is that it is currently not listed in any of the major scientific citation indices. Effectively this means that scientists do not get workplace "cred" for publishing in this journal. Thus, there is little motivation to publish even though, as Stephen put it, having a peer-reviewed publication to "rationalise" code and design decisions is essential to ensuring the scientific integrity of the models.

Talk: Climate Change & Psychological Barriers to Change

Tuesday, September 22, 2009

This week is Earthcycle at U of T: an environment week with many many great happenings (see the link for more info). In particular there is what looks to be a great lecture on Thursday discussing the recent report on psychology and climate change from the American Psychological Association.

Here's the full posting:

Thurs. Sept. 24
7:00 p.m. – 9:00 p.m.
Lecture Climate Change & Psychological Barriers to Change, with Dr. Judith Deutsch ( Science for Peace) & Prof. Danny Harvey ( U of T)
International Student Centre, Cumberland Room
33 St. George Street

This is, in part, a summary of a major conference by the Report by the American Psychological Association’s Task Force on the Interface Between Psychology and Global Climate Change titled “Psychology and Global Climate Change: Addressing a Multi-faceted Phenomenon and Set of Challenges.”

The study includes sections on concern for climate change, not feeling at risk, discounting the future, ethical concerns, population issues, consumption drivers, counter-consumerism movements, psychosocial and mental health impacts of climate change, mental health issues associated with natural and technological disasters, lessons from Hurricane Katrina, uncertainty and despair, numbness or apathy, guilt regarding environmental issues, heat and violence, displacement and relocation, social justice implications, media representations, anxiety, psychological benefits associated with responding to climate change, types of coping responses, denial, judgmental discounting, tokenism and the rebound effect, and belief in solutions outside of human control.

A copy of the report is available at http://www.apa.org/science

Dr. Judith Deutsch is a psychiatric social worker and President of Science for Peace.

Prof. Danny Harvey is with the Geography Department at UofT, a member of the IPCC, and an internationally renowned climate change expert.

Organized by Science for Peace
Facebook event page:
http://www.facebook.com/event.php?eid=131392308428&index=1

On static analysis

Monday, August 31, 2009

Last week I got serious about running a thorough static analysis (using Cleanscape's FortranLint) of one of the climate modelling packages I'm studying. It turns out to be trickier than I thought just to get the source code in a state to be analysed because of the complexity and "homebrewedness" of the configuration systems used.

What do I mean? Well, the models I'm studying are complex beasts. They are composed of many of sub-models, and those sub-models themselves are built from sub-sub-models. For example, a global climate model may be composed of an atmosphere model, an ocean model, and a land model. These sub-models are often functioning models in their own right and can often be run separately. And as I say, the sub-models are also built up from various models. The ocean model may have a sea-ice model, a biogeochemical model, and an ocean dynamics model. There may also be different versions of these sub- or sub-sub models being actively developed.

There are also piles and piles of configuration options for each of these components (the models, the sub-models, the sub-sub-models).

Thus, the climate model code shouldn't really be thought of in the singular sense. It's not source code for a climate model, but for an almost infinite number of different climate models depending on which sub-, or sub-sub-models are included in a particular build, and which configuration options are used.

A word on configuration options. The configuration system for some of the climate models I'm looking at are very complex (as you might expect). They include a generous helping of C preprocessor (CPP) instructions to include or remove chunks of code or other files in order to get just the right bits of functionality. As well, there are many makefiles and home-brewed scripts to assemble and ready the appropriate source files for compilation (e.g. move only the files land ice model version 2 files, not version 1 files, and rename them like so, etc..). Of course, there are also plenty of run-time configuration options slurped in from configuration data files (but since that happens after compilation it's not a concern to me when doing static analysis).

The upshot of all of this is that the source code for a climate model isn't shipped in a state that can be run through static analysis. In order for the static analysis tool to do it's job, it needs to be handed the source code in a ready-to-compile state. After all, the static analysis tool is an ultra-picky compiler that doesn't actually do any compilation but instead just spits out warnings about the structure of the code.

(I'm simplifying slightly: both of the static analysis tools I've looked at (FortranLint and Forcheck) both offer the ability to handle some preprocessing statements. Forcheck implemented it's own limited CPP-style preprocessor, and FortranLint will just call cpp for you on the file. Thus, it is possible to hand the static analysis tool code that isn't exactly in a compilable state, but you still need to configure the static analysis tool to do all the preprocessing... and that essentially duplicates the work that's being done by the homebrewed scripts and Makefiles).

The trouble is that getting a snapshot of the code that's ready for compilation isn't a trivial task. The homebrewed scripts and makefiles do a lot of magic as I described above. Somewhere in that magic -- and often not in one nice, distinct stage -- the code gets compiled. That is, no where in the process is there a folder of preprocessed, ready-to-compile files: configuration and compilation are bound up together.

Ideally I'd like to be able to run the configuration/compilation scripts up to the point in which they produce the ready-to-compile code, then run my static analysis tools over the code, and then continue on with the compilation process so that I can be sure that the code I'm analysing is exactly the code is able to be compiled into a working model. That would be the ultimate validation that I'm analysing the correct code, right? (If I were to use the built in preprocessing facilities of the static analysis tools I can never be sure that I've exactly duplicated the work done in the configuration scripts).

Unfortunately, this separation of configuration and compilation can't be done with out deeply understanding and re-writing the configuration scripts. hmmm... That's one option. It's more messy than I'd like it to be, but I might need to do it to remove any doubts about the validity of my results.

The other option I've come up with is a bit more cavalier, but still might be justifiable. It goes like this: redirect all calls to the compiler in the makefiles to a script that simply copies the target file to another location first before doing the actual compilation. The idea here is to intercept right at the point of compilation in order to take a snapshot of only those files that are compiled and when their in their proper configured and preprocessed state.

In fact, since I don't care about actually compiling the model, the stand-in compiler script could simply output an empty file instead of the actual compiled file. (Outputting an empty file is necessary in order to make other steps of the makefile happy and believe some real work was done.) Of course, replacing the compiler with something that doesn't actually do any compilation also requires that another programs in the makefiles that expect real work to have been done (i.e. the archiving tool, ar) must also be redirected to dummy scripts.

The result would be a folder full of ready-to-compile source files that should, in theory, all be able to be compiled together to make the climate model, and thus ready to be fed to the static analysis tool.

Also, in theory, and with less of a deeper understanding of the climate models, I should be able to compile the files I get from this process into a binary file that I can compare to the binary produced by the unadulterated configuration/compilation process in order to validate this hack.

Where I'm at: I tried putting this process in place last week with one of the models. I successfully got a nice pile of source files to analyse. I'm now just dealing with configuring the static analysis tool to handle external dependencies, but I should know soon whether this idea will work or not.

Abstract of my study for the AGU

I'm submitting an abstract of my study for the Methodologies of Climate Model Confirmation and Interpretation" session at the American Geophysical Union's Fall Meeting in December. This session (either poster or paper) is aimed at exploring the "methodological issues surrounding the confirmation, evaluation, and interpretation of climate and integrated assessment models".

Here's the current draft of the abstract. I've found it a little tricky to write an abstract for work that I haven't yet completed but I've given it a go. I've gotten some excellent feedback from some of my colleagues (big up to: Steve, Neil, Jono, and Jorge) as to how to frame the problem and my "results" (in quotations because I don't yet have concrete results).

On the software quality of climate models
A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities -- an established software engineering technique for studying software quality -- of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.

Feedback on clarity, wording, grammar, framing of the problem and results, etc... are very much welcome.

Workshops at PowerShift Canada

Monday, August 17, 2009

PowerShift Canada is a weekend-long youth conference on climate change taking place October 23-26, 2009, in Ottawa. It's modelled after the US PowerShift conferences. Over 1000 highschool and university students and other youth will assemble for hands-on workshops and lectures, and then a full day of lobbying action.

I attended the US PowerShift conference and it was a lot of fun and very inspiring. I'm doing a bit of work with the programming committee for PowerShift Canada. We're looking for speakers and facilitators to run workshops and give talks. Specifically, I'd like to ask you all for ideas on who to invite to speak on the following topics:

A climate science backgrounder
Climate modelling 101
An insider's perspective on the IPCC
Communicating the science of climate change
Developing Canada's GHG inventory

I'm also working on fleshing out workshops on Health and Community, as well as more practical skills workshops (e.g. how to be involved in non-violent civil disobedience, how to facilitate a group meeting, how to cope with activist burn-out, etc.).

If you have any suggestions of potential speakers, if you'd like to speak yourself, or if you'd like to suggest workshop topics, send me email at jon.programming@powershiftcanada.org.

Counting lines of code

I've been using the CodeCount tool to count lines of Fortran code. Here'r some of the gruesome details of what that entails -- for posterity's sake.

In part of my study I'm measuring defect densities of various climate models. Defect density is the number of defects divided by size of the project measured in lines of code (and most often per 1000 lines of code). Thus, I need to be able to count lines of code. Fortran. Often mixed versions. In this blog post I'll describe one of the limitations I've come across in using the CodeCount tool.

The following table summarises the default behaviour of the CodeCount tool on a snippet of Fortran. The Lines column contains the lines of the Fortran and preprocessor code being analysed. Note, this isn't a working piece of code in any way but that doesn't matter to the CodeCount tool. It's just a collection of lines I used to test the tools behaviour. Anyhow, the Type column specifies how CodeCount categorised the line: comment (comm), blank line (blank), executable (exec), data declaration (decl), or compiler directive (comp). The Physical Line and Logical Line columns specify whether CodeCount counts these lines towards the physical and logical line counts, respectively.

Lines	Type	Physical Line?	Logical Line?
!! this is a comment	comm	no	no
	blank	no	no
#if defined foo	exec	yes	yes
#ifdef key_squares	exec	yes	yes
#include "SetNumberofcells.h"	comp	yes	yes
#else	exec	yes	no
#endif	exec	yes	yes
SUBROUTINE A(Sqr_Grid)	decl	yes	yes
USE Sqr_Type	exec	yes	no
IMPLICIT NONE	decl	yes	yes
IF (assoc(cur_grid)) THEN	exec	yes	yes
Type(grid), Pointer :: Sqr_Grid	decl	yes	yes
WRITE(,) &	exec	yes	no
'Hello'	exec	yes	yes
ENDIF	exec	yes	yes
END SUBROUTINE A	data	yes	no

The physical line count is just a count of non-blank, non-comment lines. The logical line count tries to be a bit smart by counting lines in more abstract terms (I imagine a philosopher-computer scientist in some windowed office somewhere chin-stroking and asking, "What is a line of code?"). Anyhow, CodeCount computes logical line count by ignoring lines with continuation characters (e.g. "&") and certain other statements (e.g. "USE", "CASE", "END IF", "ELSE") and by counting each statement in a multi-statement line as a separate line. The full specification is in the CodeCount source if you're interested.

So the question I could ask is: do I use the logical or physical line count? It's a small question but, oh, I went there. The logical line count is appealing in that it seems likely to be more robust across different coding styles, and maybe gets more at the essence of what the size of a program is (whatever that means; see chin-stroking philosopher above for more information).

Unfortunately the CodeCount tool is too smart (or too stupid) in the way that it counts logical lines. It doesn't gracefully handle pre-processor statements or certain Fortran dialects. This you can see from the table above in the two places I've highlighted in red.

As far as I can make out, as long as a line contains only "ELSE" (other than non-word characters) CodeCount counts this line only as a physical line, not a logical line. So, it counts preprocessor lines as logical lines, except in the case of "#else", which it ignores. Should preprocessor lines be counted as lines of code? I don't know, maybe. Probably, in fact. If so, then we should count all of them as logical lines. Unfortunately, from the bit of digging I've done I can't see how to get CodeCount to consider "#else" as a logical line without messing with the code. No thanks.

But, alas, there's more. CodeCount counts an "ENDIF" as a logical line as you can see, but I don't think it should. See, as mentioned, it's built so that it does not count an "END IF" as a logical line. Now, I'm totally new to Fortran but most references I've come across close an IF block with an END IF, but I've seen one or two references to closing an IF block with an ENDIF. And in fact, some of the code I'm analysing uses exactly that syntax. So, CodeCount will have a slightly inflated logical line count if I use it for these source files.

Again, to fix this problem I'd have to resort to hacking the source if I want CodeCount. And, since I'm so new to Fortran I don't even know the extent to which there are differences in the various dialects so even if I were to decide hacking the source was a good idea, I wouldn't ever be sure I'd fixed it completely. (For instance, I just found out there are also "ENDDO" statements, not "END DO", statements in one of my sources!)

In short: I've been sticking to using physical line counts.