Validity and soundness in scientific software

Wednesday, November 4, 2009

In today's workshop on Software Engineering for Science we spent quite a bit of time discussing the different levels of correctness of scientific software. I was surprised since I had thought some of this was pretty basic stuff. After a bit of reflection I wonder if it isn't because we don't have common terms for these ideas.

To be clear, I'm referring to verification and validation. These activities are summed up by the questions, "Are we building the right thing?" (validation) and "Are we building the thing right?" (verification). Another way of looking at this is that verification is the act of checking that software meets its specifications, whereas validation is checking that software meets its requirements.

This comes up when you talk about scientific software since in many cases the software is supposed to enact a theory or mathematical model. Validation checks that the mathematical model is accurate where as verification checks that the software implements the mathematical model accurately.

Clearly we have words for "verification" and "validation", though I don't remember these words being used much today, or at all. The fact that they aren't commonly used and that we needed to discuss the distinction between these activities is curious to me.

But more so, whilst we have the words to discuss the activities we don't seem to have adjectives to refer to the software itself. (Do we? Tell me if we do.) I suppose we could use the terms "verified software" and "validated software". "Verified" is overloaded though. I immediately want to ask "by whom?", as if the term refers to software inspected and given a stamp of approval by an outside agency. "Validated software" seems okay though.

Borrowing from formal logic, could we refer to the "soundness" and "validity" of software?

Privilege

This deserves a much more in-depth discussion which I'm not going to go into here. But I wanted to just take a moment to publicly recognise how privileged I feel, and am, in school. Of course it's not just in being at school that I'm privileged.. it's the country I live in, the socio-economic class I am part of, the people I know, my ethnicity, and so on. And school is a whole other level of privilege.

Today leaving the CASCON conference with two of my colleagues I thought again about how damned lucky I am to be a student here. This is truly a luxurious life. I spent today sitting around a table in a warm room talking with other students and professors about whatever the hell interested us at the moment. We talked while we ate our free lunch. (I repeat, we had a free lunch!) After that we went into another room and talked some more. Again, we talked about whatever interested us. At some point we paused to have tea and stretch. Then we returned to talking until we had had enough. A few of us went home together and spent the entire trip discussing ideas for tomorrow. It was a day of ideas.

And that was a day of work. Ah-mazing. When I'm not at a conference I get to spend an entire day at a sunny desk, spending my day as I please, reading, talking to people, making notes to myself, and generally working on projects as I please.

I feel so so lucky and grateful to be here. It's a fullness of feeling which I'm not sure I can explain all that well. The flip side is that I also feel upset at myself for the times when I take this life for granted. I find it easy to do. Take it for granted, I mean. There are times when, to the exclusion of other feelings, I feel worried about my future, or about a deadline, or how my research project might turn out, etc... But, peanuts! I am a king!

I'm not sure why, but I feel compelled to acknowledge and mention this right now. Maybe just as a reminder for myself. But I'd appreciate hearing any thoughts you have on this topic; so use the comments.

CSER poster session

Monday, November 2, 2009

This week I attended the poster session at the CSER gathering. This was a great thing to do for a few reasons. Just creating the poster helped me pull together some of my thoughts and results so far. In the same vein, just having to pitch my study and explain what I've been up to helped to clarify my thoughts or bring up new questions. Then, of course, there's the feedback and criticism I get from the attendees, and the new questions they raise (intentionally or otherwise). It's also just fun and validating to have people listen to what I've been up to and engage in a discussion about it... makes me feel like I'm doing something worth talking about.

My poster was in the form of nine "slides". Here they are, with a bit of explanation about each of the slides.





My study is, as you know, still underway. What I'm presenting here are the method and some preliminary results. I wanted to present at CSER because I wanted to hear what other people would say about some of my findings so far, and whether anyone would have suggestions of where to go next.



As a motivation for my study, consider how the computational scientist qua climatologist goes about trying to learn about the climate. In order to test their theories of the climate, they would like to run experiments. Since they cannot run experiments on the climate they instead build a computer simulation of the climate (a climate model) according to their theories and then run their experiments on the model.

At every step there are approximations and error introduced. Moreover, the experiments that they run cannot all be replicated in the real world, so there is no "oracle" they can use to check their results against. (I've talked about this before.) All of this might lead you to ask ...Why do climate modelers trust their models? Or..


... for us as software researchers, we might ask: why do they trust their software? That is, irrespective of the validity of their theories, why do they trust their implementation of those theories in software?

The second question should actually read "What does software quality mean to climate modelers*?"

As I see it, you can try to answer the trust question by looking at the code or development practices, deciding if they are satisfactory and, if they are, concluding that the scientists trust their software because they are building it well and it it is, in some objective sense, of high quality.

Or you can answer this questions by asking the scientists themselves why they trust their software -- what plays into their judgment of good quality software. In this case the emphasis in the question is slightly different, "Why do climate modelers trust their software?"

The second, and to some extent third, research questions are aimed here.

* Note how I alternate between using "climate scientist" and "climate modeler" to reference the same group of people.


My approach to answering these questions is to do a defect density analysis (I'm not sure why I called it "repository analysis" on my slides. Ignore that) of several climate models. Defect density is an intuitive and standard software engineering measure of software quality.

The standard way to computer defect density is to count the number of reported defects for a release per thousand lines of code in that release. There are lots of problems with this measure, but one is that it is subject to how good the developers are at finding and reporting bugs. A more objective measure of quality may be their static fault density. So I did this type of analysis as well.

Finally, I interviewed modelers to gather their stories of finding and fixing bugs as a way to understand their view and decision-making around software quality.

There are five different modeling centres participating in various aspects of this study.




A very general definition of a defect is: anything worth fixing. Deciding what is worth fixing is left up to the people working with the model, so we can be sure we are only counting relevant defects.

Many of the modeling centres I've been in contact with use some sort of bug tracking system. That makes counting defects easy enough (the assumption being that if there is a ticket written up about an issue, and the ticket is resolved, then it worth fixing and we'll call it a defect).

Another way to identify defects is to look through the check-ins of the version control repository and decide if the check-in was a fix for a defect simply by looking at the comment itself. Sure, it's not perfect, but it might be a more reliable measure across modeling centres.


Presented here is the defect density for an arbitrary version of the model from each of the modeling centres. For persective, along the x-axis of the chart I've labeled two ranges "good" and "average" according to Norman Fenton's online book on software metrics. I've included a third bar, the middle one, that shows the defect density when you consider only those check-in comments which can be associated with tickets (i.e. there is a reference in the comment to a ticket marked as a defect).

The top, "all defects", bar is the count of check in comments that look like defect fixes. I have included in the count all of the comments made 6 months before and after the release date. You can see that bar is divided into two parts. The left represents the pre-release defects, and the right represents the post-release defects.

As yet, the main observation I have is that all of the models have a "low" defect density however you count defects (tickets, or check-in comments).

It's also apparent that the modeling centres use their ticketing systems to varying degrees, as well as they have different habits about referencing tickets in their check-in comments.





I ran the FLINT tool over a single configuration of, currently only two, climate models. The major faults I've found are about implicit type conversion and declaration. As well, there are a significant (but small) portion of faults that suggest dead code. Of course, because I'm analysing only a single configuration of the model, I can't be sure that this code is really dead. I've inspected the code where some of these faults occur and I've found instances of both dead code and of code that isn't really dead in other configurations.

One example of dead code I found came from a module that had a collection of functions to perform analysis on different array types. The analysis was simliar for each function, with a few changes to the function to handle the particularity of the array. The dead code found in this module was variables that were declared and set but never referenced. My guess from looking at the regularities in the code is that because the functions were so similar, the developers just wrote one function and then copied it several times and tweaked it for each array type. In the process they forgot to remove code that didn't apply.


Unfortunately, I have as yet only been able to interview a couple of modellers specifically about defects they have found and fixed. I have done a dozen or so interviews with modelers and other computational scientist to talk about how they do their development and software quality in general. So this part of the study is still a little lightweight, and very preliminary.

In any case, when I've done the interviews I ask the modelers to go through a couple of bugs that they've found and fixed. I roughly asked them these questions.

Everyone I've talked to is quite aware that their models have bugs. This, they accept as a fact of life. Partly this is a comment on the nature of a theory being an approximation, but they also include software bugs here too. Interestingly, they still believe that, depending on the bug, they can extract useful science from the model. One interviewee described how in the past, when computer time was more costly, if scientists found bugs part way through a 6 month model run they might let the run continue, publish the results but include a note about the bug they found and analysis about its effect.



The other observation I have is connected the last statement on the previous slide, as well as this slide.

Once the code has reached a certain level of stability, but before the code is frozen for a release of the model, scientists in the group will being to run in-depth analysis on it. Both bug fixes and feature additions are code changes that have the potential to change the behaviour of the model, and so invalidate the analysis that has already been done on the model. This is why I say that some bugs can be treated as "features" of a sort: just an idiosyncracy of the model. Similarily, a new feature might be rejected as a "bug" if it's introduced too late in the game.

In general, the criticality of a defect is in part judged on when it is found (like any other software project I suppose). I've identified several factors that I've heard the modellers talk about when they consider how important a defect is. I've roughly categorised these factors into three groups: concerns that depend on the project timeline (momentum), concerns arising from high-level design and funding goals (design/funding), and the more immediate day-to-day concerns of running the model (operational). Very generally, these concerns have more weight at different stages in the development cycle which I tried to represent on the chart.

Describing these concerns in detail probably involves a separate blog post.

Morning discussion for the WSRCC

Monday, October 26, 2009

This morning Jorge and I attempted to attend the Workshop on Software Research on Climate Change via a skype phone call.   But Skype wasn't cooperating.  So, we had our a own mini-workshop ourselves.  The purpose of the workshop is to respond to the challenge, "how can we apply our research strengths to make significant contributions to the problems of mitigation and adaptation of climate change?"  But we interpreted the question as, "What can software researchers do to make significant contributions.... ?"  As a result, we considered some alternatives that are probably out of scope for the workshop.
  • Drop out of research.  We recognise climate change is an urgent problem and that many scientific research projects have very indirect, uncertain, and long-term payoffs. For the most part, the problem of climate change is fairly well analysed and many solutions are known, but in need of political organisation in order to carry them out. Perhaps really what is needed is for more people to "roll up their sleeves" and join a movement or organisation that's fighting towards this. 
  • Engage in action research/participatory research. If you decide to stay in research then we propose that you ground your studies by working on problems that you can be sure real stakeholders have.  In particular, we suggest that you start with a stakeholder that is directly involved in solving the problem (e.g. activists, scientists, journalists, politicians) and that you work with throughout your study.  At the most basic level, they act as a reality-check for your ideas, but we think that the best way to make this relationship work is through action research: joining their organisation to solve their problems, becoming directly involved in the solutions yourself.  Finding publishable results is an added bonus which is secondary to the pressing need.  
  • Elicit the requirements of real world stakeholders.  As you can see from the last point, we're concerned that as software researchers we lack a good understanding of the problems holding us (society) back from dealing with climate change effectively.  So, we suggest a specific research project that surveys all the actors to figure out their needs and the place the software research can contribute.  This project would involve interviewing activists, scientists, journalists, politicians, and citizens to build a research roadmap. 
  • Green metrics: dealing with accountability in a carbon market.  This idea is more vague, but simply a pointer to an area where we think software research may have some applicability. Assuming there is a compliance requirement for greenhouse gas pollution (e.g. a cap and trade system), then we will need to be able to accurately measure carbon emissions on all levels: from industry to homes.  
  • Software for emergencies.  Like the last point, this is one rather vague.  The idea is this: in doomsday future scenarios of climate change, the world is not a peaceful place.  Potentially more decision-making is done by people in emergency situations.  This context shift might change the rules for interface design: where say, in peacetime, a user might be unwilling to double-click on a link, or might be willing to spend time browsing menus, but in a disaster scenario their preferences may change.   So, how exactly does a user's preferences change in an emergency, and how might we design software to adjust to them? 
  • Make video-conferencing actually easy.  This was our experience all through the day:If we ever want to maintain our personal connections without traveling we need to solve this problem.  You'd think that we had already solved it, as we have the basic technology already in place.  We have Skype, it is just too flakey for relying on for important gatherings.  Or, maybe, hotels and conference centres can't deal with the bandwidth demands.  Or, maybe conference organisers don't make remote attendance a priority. 

    Even getting us through the basic technological obstacles may not be enough for a rich conference participation.  Simply having a video and audio feed doesn't compare to face-to-face conversations.  Maybe it never will, but certainly we can do better?

Position papers from the 1st Intl. Workshop on Software Research and Climate Change

Sunday, October 25, 2009

Tomorrow the First International Workshop on Software Research and Climate Change is being held as part of the Onward! 2009 conference in Florida. Jorge and I are going to attempt to attend the workshop remotely, so wish us luck. I'll be blogging about the experience tomorrow.

To begin, and as a refresher, I thought I'd post a single sentence summary of each of the position papers submitted for this workshop. Position papers were solicited from participants and were to respond to the challenge stated on the opening page of the workshop. In summary, the challenge is: how do we apply our expertise in software research to save our butts from certain destruction due to climate collapse. Or, as Steve puts it, "how can we apply our research strengths to make significant contributions to the problems of mitigation and adaptation of climate change."

In answer to that challenge, the position papers suggest software research should...

"Data Centres vs. Community Clouds", Gerard Briscoe and Ruzanna Chitchya

... tackle the energy inefficiency of cloud computing by investigating decentralised models where consumer machines also become providers and coordinators of computing resources.

"Optimizing Energy Consumption in Software Intensive systems", Arjan de Roo, Hasan Sozer and Mehmet Aksit

... provide the tools and design patterns for building software systems that meet both their energy-consumption requirements and their functional design requirements.

"Modeling for Intermodal Freight Transportation Policy Analysis", J. Scott Hawker

... improve three aspects of decision-making tools (like, say, an intermodal freight transportation policy analysis model): make them easier to use and interact with (HCI-wise); deal with the complexity of the models and the troubles with integrating various existing implementations; as well as (my favourite), make sure the software is built well since most of the folks doing the building are not trained.

"Computing Education with a Cause", Lisa Jamba

... investigate how to involve computer science students in research "toward improving health outcomes related to climate change" as part of the university curriculum.

"Some Thoughts on Climate Change and Software Engineering Research", Lin Liu, He Zhang, and Sheikh Iqbal Ahamed

... investigate how to navigate and integrate knowledge from many different disciplines and perspectives so as to help people communicate and work together; build decision-support, analysis and educational tools for people, companies, and government; build tools for incorporating environmental non-functional requirements into software construction.

"Refactoring Infrastructure: Reducing emissions and energy one step at a time", Chris Parnin and Carsten Görg.

... use insights from software refactoring to develop refactoring techniques for physical infrastructure (energy grid, water supply, etc.).

"In search for green metrics", Juha Taina and Pietu Pohjalainen

... establish a "framework for estimating or measuring the effects of a software systems' effect on climate change."

"Enabling Climate Scientists to Access Observational Data", David Woollard, Chris Mattmann, Amy Braverman, Rob Raskin, and Dan Crichton

... build systems to help climate scientists locate, transfer, and transform observational data from disparate sources.

"Context-aware Resource Sharing for People-centric Sensing", Jorge Vallejos, Matthias Stevens, Ellie D’Hondt, Nicolas Maisonneuve, Wolfgang De Meuter, Theo D’Hondt, and Luc Steels.

... investigate how to use our everyday hand-held devices as sensors to provide fine-grained environmental data.

"Language and Library Support for Climate Data Applications", Eric Van Wyk, Vipin Kumar, Michael Steinbach, Shyam Boriah, and Alok Choudhary

... build language extensions and libraries to make climate data analysis easier and more computationally efficient.

Modeling the solutions to climate change

Tuesday, October 20, 2009

For the past couple of weeks a few of us in the software engineering group have been meeting to take up Steve's modeling challenge: we are attempting to model (visually, not computationally) the proposed solutions from several popular books. The idea is to do so so that it's possible (easy?) to compare the differences and similarities between them. Here is the homepage* for the project, which roughly tracks what we're up to. I'm going to summarise our progress so far.

To start off, we narrowed our focus down to just comparing the books by their take on wind power solutions. We began with David McKay's excellent book, Sustainable Energy -- without the hot air.

In our first few meeting we decided to just "shoot first and ask questions later". That is to say, we just collaborative built up a model of the chapters on wind power as we saw fit in the moment, without following any visual syntax and without worrying too much about what to include or what to ignore. The result looked like this:


At the bottom of that picture is our brainstorming about what other aspects to include (the left hand column), the types of perspectives/analysis that McKay uses and that may be useful to include a future exercise (middle column), and the types of differences we expect to see when comparing models (right column).

The next step would have been to come up with the same sort of model for another book, and then start to figure out how best to make the models comparable so that it is visually easy to see the differences and similarities between the various models.

We didn't do that. Instead, we decided to try making a more principled model. Actually, set of models. We decided to construct an entity-relationship (ER) model, and a goal model (i*) for two books and then see about how to go about making those models comparable.

We began with the entity-relationship model. Again, for McKay's book. McKay's book is fairly well segmented into chapters that have back-of-the-envelope-style analysis and others that have a more broad discussion of the actors and issues. In our first attempt shown above, we mainly only modeled the two chapters on wind-power analysis. But if we just stuck to those chapters for the ER and goal models we'd be left with very impoverished models that miss all of the important contextual bits that frame the wind-power discussion. We relaxed the restriction on our wind-power focus slightly so as to include parts of the book that discuss the context. In the case of McKay's book, chapter one covers this nicely.

After our first few meetings we've completed the ER domain model, as well as made a good start on the goal model.

For the wider context (chapter one), we built the following ER model:


This model is a bit of a monster, but I'm told that most models are like that. Other than the standard UML relationship syntax, we have coloured the nodes to represent whether the concept comes from the book directly (blue), or whether we included it because we felt it was implied or simply helpful for clarity (yellow).

Using the same process we created the following ER model for just the two chapters on wind:

As well, we've begun to go back over the first chapter and build up an i* goal model. Here it is so far:


Stay tuned for further updates on what we're up to. I'd suggest that at the moment these models should simply be taken as our first hack. We haven't done any work whatsoever to make them very readable or comparable, for instance.

* I feel like "homepage" is a rather outdated word now. Is that so?

Geoscientific Model Development

Sunday, October 18, 2009

I had an wonderful chat last week with Stephen Griffies from GFDL. It was a fascinating interview that I'll have to blog about over several posts because we just covered so much territory.

One especially interesting pointer Stephen gave me was to a new journal from European Geosciences Union titled Geoscientific Model Development. This is a journal that accepts articles about the nuts and bolts of building modelling software. It is apparently the only journal like it. Most of the other journals that climate scientists publish in will only accept papers on the "science" derived from the use of such models.

For those of us interested in how climate models are developed, this journal will likely be very relevant. What I find particular cool is the transparent peer-review process and open-discussion. This means for a particular article (say, this one on coupling software for earth-system modelling), you can read the paper and the current referee reviews (with the option to submit your own comments).

One issue with the journal Stephen mentioned is that it is currently not listed in any of the major scientific citation indices. Effectively this means that scientists do not get workplace "cred" for publishing in this journal. Thus, there is little motivation to publish even though, as Stephen put it, having a peer-reviewed publication to "rationalise" code and design decisions is essential to ensuring the scientific integrity of the models.

Talk: Climate Change & Psychological Barriers to Change

Tuesday, September 22, 2009

This week is Earthcycle at U of T: an environment week with many many great happenings (see the link for more info). In particular there is what looks to be a great lecture on Thursday discussing the recent report on psychology and climate change from the American Psychological Association.

Here's the full posting:

Thurs. Sept. 24
7:00 p.m. – 9:00 p.m.
Lecture Climate Change & Psychological Barriers to Change, with Dr. Judith Deutsch ( Science for Peace) & Prof. Danny Harvey ( U of T)
International Student Centre, Cumberland Room
33 St. George Street

This is, in part, a summary of a major conference by the Report by the American Psychological Association’s Task Force on the Interface Between Psychology and Global Climate Change titled “Psychology and Global Climate Change: Addressing a Multi-faceted Phenomenon and Set of Challenges.”

The study includes sections on concern for climate change, not feeling at risk, discounting the future, ethical concerns, population issues, consumption drivers, counter-consumerism movements, psychosocial and mental health impacts of climate change, mental health issues associated with natural and technological disasters, lessons from Hurricane Katrina, uncertainty and despair, numbness or apathy, guilt regarding environmental issues, heat and violence, displacement and relocation, social justice implications, media representations, anxiety, psychological benefits associated with responding to climate change, types of coping responses, denial, judgmental discounting, tokenism and the rebound effect, and belief in solutions outside of human control.

A copy of the report is available at http://www.apa.org/science

Dr. Judith Deutsch is a psychiatric social worker and President of Science for Peace.

Prof. Danny Harvey is with the Geography Department at UofT, a member of the IPCC, and an internationally renowned climate change expert.

Organized by Science for Peace
Facebook event page:
http://www.facebook.com/event.php?eid=131392308428&index=1

On static analysis

Monday, August 31, 2009

Last week I got serious about running a thorough static analysis (using Cleanscape's FortranLint) of one of the climate modelling packages I'm studying. It turns out to be trickier than I thought just to get the source code in a state to be analysed because of the complexity and "homebrewedness" of the configuration systems used.

What do I mean? Well, the models I'm studying are complex beasts. They are composed of many of sub-models, and those sub-models themselves are built from sub-sub-models. For example, a global climate model may be composed of an atmosphere model, an ocean model, and a land model. These sub-models are often functioning models in their own right and can often be run separately. And as I say, the sub-models are also built up from various models. The ocean model may have a sea-ice model, a biogeochemical model, and an ocean dynamics model. There may also be different versions of these sub- or sub-sub models being actively developed.

There are also piles and piles of configuration options for each of these components (the models, the sub-models, the sub-sub-models).

Thus, the climate model code shouldn't really be thought of in the singular sense. It's not source code for a climate model, but for an almost infinite number of different climate models depending on which sub-, or sub-sub-models are included in a particular build, and which configuration options are used.

A word on configuration options. The configuration system for some of the climate models I'm looking at are very complex (as you might expect). They include a generous helping of C preprocessor (CPP) instructions to include or remove chunks of code or other files in order to get just the right bits of functionality. As well, there are many makefiles and home-brewed scripts to assemble and ready the appropriate source files for compilation (e.g. move only the files land ice model version 2 files, not version 1 files, and rename them like so, etc..). Of course, there are also plenty of run-time configuration options slurped in from configuration data files (but since that happens after compilation it's not a concern to me when doing static analysis).

The upshot of all of this is that the source code for a climate model isn't shipped in a state that can be run through static analysis. In order for the static analysis tool to do it's job, it needs to be handed the source code in a ready-to-compile state. After all, the static analysis tool is an ultra-picky compiler that doesn't actually do any compilation but instead just spits out warnings about the structure of the code.

(I'm simplifying slightly: both of the static analysis tools I've looked at (FortranLint and Forcheck) both offer the ability to handle some preprocessing statements. Forcheck implemented it's own limited CPP-style preprocessor, and FortranLint will just call cpp for you on the file. Thus, it is possible to hand the static analysis tool code that isn't exactly in a compilable state, but you still need to configure the static analysis tool to do all the preprocessing... and that essentially duplicates the work that's being done by the homebrewed scripts and Makefiles).

The trouble is that getting a snapshot of the code that's ready for compilation isn't a trivial task. The homebrewed scripts and makefiles do a lot of magic as I described above. Somewhere in that magic -- and often not in one nice, distinct stage -- the code gets compiled. That is, no where in the process is there a folder of preprocessed, ready-to-compile files: configuration and compilation are bound up together.

Ideally I'd like to be able to run the configuration/compilation scripts up to the point in which they produce the ready-to-compile code, then run my static analysis tools over the code, and then continue on with the compilation process so that I can be sure that the code I'm analysing is exactly the code is able to be compiled into a working model. That would be the ultimate validation that I'm analysing the correct code, right? (If I were to use the built in preprocessing facilities of the static analysis tools I can never be sure that I've exactly duplicated the work done in the configuration scripts).

Unfortunately, this separation of configuration and compilation can't be done with out deeply understanding and re-writing the configuration scripts. hmmm... That's one option. It's more messy than I'd like it to be, but I might need to do it to remove any doubts about the validity of my results.

The other option I've come up with is a bit more cavalier, but still might be justifiable. It goes like this: redirect all calls to the compiler in the makefiles to a script that simply copies the target file to another location first before doing the actual compilation. The idea here is to intercept right at the point of compilation in order to take a snapshot of only those files that are compiled and when their in their proper configured and preprocessed state.

In fact, since I don't care about actually compiling the model, the stand-in compiler script could simply output an empty file instead of the actual compiled file. (Outputting an empty file is necessary in order to make other steps of the makefile happy and believe some real work was done.) Of course, replacing the compiler with something that doesn't actually do any compilation also requires that another programs in the makefiles that expect real work to have been done (i.e. the archiving tool, ar) must also be redirected to dummy scripts.

The result would be a folder full of ready-to-compile source files that should, in theory, all be able to be compiled together to make the climate model, and thus ready to be fed to the static analysis tool.

Also, in theory, and with less of a deeper understanding of the climate models, I should be able to compile the files I get from this process into a binary file that I can compare to the binary produced by the unadulterated configuration/compilation process in order to validate this hack.

Where I'm at: I tried putting this process in place last week with one of the models. I successfully got a nice pile of source files to analyse. I'm now just dealing with configuring the static analysis tool to handle external dependencies, but I should know soon whether this idea will work or not.

Abstract of my study for the AGU

I'm submitting an abstract of my study for the Methodologies of Climate Model Confirmation and Interpretation" session at the American Geophysical Union's Fall Meeting in December. This session (either poster or paper) is aimed at exploring the "methodological issues surrounding the confirmation, evaluation, and interpretation of climate and integrated assessment models".

Here's the current draft of the abstract. I've found it a little tricky to write an abstract for work that I haven't yet completed but I've given it a go. I've gotten some excellent feedback from some of my colleagues (big up to: Steve, Neil, Jono, and Jorge) as to how to frame the problem and my "results" (in quotations because I don't yet have concrete results).

On the software quality of climate models

A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities -- an established software engineering technique for studying software quality -- of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.

Feedback on clarity, wording, grammar, framing of the problem and results, etc... are very much welcome.

Workshops at PowerShift Canada

Monday, August 17, 2009

PowerShift Canada is a weekend-long youth conference on climate change taking place October 23-26, 2009, in Ottawa. It's modelled after the US PowerShift conferences. Over 1000 highschool and university students and other youth will assemble for hands-on workshops and lectures, and then a full day of lobbying action.

I attended the US PowerShift conference and it was a lot of fun and very inspiring. I'm doing a bit of work with the programming committee for PowerShift Canada. We're looking for speakers and facilitators to run workshops and give talks. Specifically, I'd like to ask you all for ideas on who to invite to speak on the following topics:
  • A climate science backgrounder
  • Climate modelling 101
  • An insider's perspective on the IPCC
  • Communicating the science of climate change
  • Developing Canada's GHG inventory
I'm also working on fleshing out workshops on Health and Community, as well as more practical skills workshops (e.g. how to be involved in non-violent civil disobedience, how to facilitate a group meeting, how to cope with activist burn-out, etc.).

If you have any suggestions of potential speakers, if you'd like to speak yourself, or if you'd like to suggest workshop topics, send me email at jon.programming@powershiftcanada.org.

Counting lines of code

I've been using the CodeCount tool to count lines of Fortran code. Here'r some of the gruesome details of what that entails -- for posterity's sake.

In part of my study I'm measuring defect densities of various climate models. Defect density is the number of defects divided by size of the project measured in lines of code (and most often per 1000 lines of code). Thus, I need to be able to count lines of code. Fortran. Often mixed versions. In this blog post I'll describe one of the limitations I've come across in using the CodeCount tool.

The following table summarises the default behaviour of the CodeCount tool on a snippet of Fortran. The Lines column contains the lines of the Fortran and preprocessor code being analysed. Note, this isn't a working piece of code in any way but that doesn't matter to the CodeCount tool. It's just a collection of lines I used to test the tools behaviour. Anyhow, the Type column specifies how CodeCount categorised the line: comment (comm), blank line (blank), executable (exec), data declaration (decl), or compiler directive (comp). The Physical Line and Logical Line columns specify whether CodeCount counts these lines towards the physical and logical line counts, respectively.

LinesTypePhysical Line?Logical Line?
!! this is a commentcommnono

blanknono
#if defined fooexecyesyes
#ifdef key_squaresexecyesyes
#include "SetNumberofcells.h"compyesyes
#elseexecyesno
#endifexecyesyes
SUBROUTINE A(Sqr_Grid)declyesyes
USE Sqr_Typeexecyesno
IMPLICIT NONEdeclyesyes
IF (assoc(cur_grid)) THENexecyesyes
Type(grid), Pointer :: Sqr_Grid
declyesyes
WRITE(*,*) &execyesno
'Hello'execyesyes
ENDIFexecyesyes
END SUBROUTINE Adatayesno

The physical line count is just a count of non-blank, non-comment lines. The logical line count tries to be a bit smart by counting lines in more abstract terms (I imagine a philosopher-computer scientist in some windowed office somewhere chin-stroking and asking, "What is a line of code?"). Anyhow, CodeCount computes logical line count by ignoring lines with continuation characters (e.g. "&") and certain other statements (e.g. "USE", "CASE", "END IF", "ELSE") and by counting each statement in a multi-statement line as a separate line. The full specification is in the CodeCount source if you're interested.

So the question I could ask is: do I use the logical or physical line count? It's a small question but, oh, I went there. The logical line count is appealing in that it seems likely to be more robust across different coding styles, and maybe gets more at the essence of what the size of a program is (whatever that means; see chin-stroking philosopher above for more information).

Unfortunately the CodeCount tool is too smart (or too stupid) in the way that it counts logical lines. It doesn't gracefully handle pre-processor statements or certain Fortran dialects. This you can see from the table above in the two places I've highlighted in red.

As far as I can make out, as long as a line contains only "ELSE" (other than non-word characters) CodeCount counts this line only as a physical line, not a logical line. So, it counts preprocessor lines as logical lines, except in the case of "#else", which it ignores. Should preprocessor lines be counted as lines of code? I don't know, maybe. Probably, in fact. If so, then we should count all of them as logical lines. Unfortunately, from the bit of digging I've done I can't see how to get CodeCount to consider "#else" as a logical line without messing with the code. No thanks.

But, alas, there's more. CodeCount counts an "ENDIF" as a logical line as you can see, but I don't think it should. See, as mentioned, it's built so that it does not count an "END IF" as a logical line. Now, I'm totally new to Fortran but most references I've come across close an IF block with an END IF, but I've seen one or two references to closing an IF block with an ENDIF. And in fact, some of the code I'm analysing uses exactly that syntax. So, CodeCount will have a slightly inflated logical line count if I use it for these source files.

Again, to fix this problem I'd have to resort to hacking the source if I want CodeCount. And, since I'm so new to Fortran I don't even know the extent to which there are differences in the various dialects so even if I were to decide hacking the source was a good idea, I wouldn't ever be sure I'd fixed it completely. (For instance, I just found out there are also "ENDDO" statements, not "END DO", statements in one of my sources!)

In short: I've been sticking to using physical line counts.

Some results from Forcheck

Friday, July 17, 2009

In this post I'll describe some of what I've found by using Forcheck to analyse climate modelling code. I am currently evaluating it and Fortranlint as tools for the static analysis portion of my study. tl;dr: forcheck took some time to configure, but it was configurable in every way I've needed it to be (with a few minor exceptions) and the results seem to be exactly like what I was hoping for.

I decided to start with analysing NASA modelE climate model because I could easily understand the build/configuration system and navigation my way around code easily enough. Some of the other models I have the source to seem a bit trickier. The analysis I'm about discuss isn't on the entire model source code, but only for one particular configuration of modules (an ocean-atmosphere coupled configuration though, so it includes many of the source modules).

In any case, I'll give you the goods upfront. I'll show you the big long summary of problems found forcheck found, but first let me explain the format. Here is an example item:
    2x[ 84 I] no path to this statement
The "2x" means that this issue was found two times. 84 is the unique issue identifier that can be used to look up the issue in the forcheck documentation. I means that this issue is an informative type issue, as opposed to a W warning, or E error. The rest of the line contains a short description of the issue. Got it? Good.

Here's the big list:
1564x[344 I] implicit conversion of constant (expression) to higher accuracy
635x[681 I] not used
265x[675 I] named constant not used
144x[323 I] variable unreferenced
144x[699 I] implicit conversion of real or complex to integer
125x[ 94 E] syntax error
108x[109 I] lexical token contains non-significant blank(s)
107x[319 W] not locally allocated, specify SAVE in the module to retain data
96x[557 I] dummy argument not used
78x[345 I] implicit conversion to less accurate data type
65x[316 W] not locally defined, specify SAVE in the module to retain data
65x[665 I] eq.or ineq. comparison of floating point data with constant
38x[313 I] possibly no value assigned to this variable
35x[342 I] eq.or ineq. comparison of floating point data with zero constant
34x[644 I] none of the entities, imported from the module, is used
27x[124 I] statement label unreferenced
27x[315 I] redefined before referenced
27x[341 I] eq. or ineq. comparison of floating point data with integer
22x[125 I] format statement unreferenced
21x[ 1 I] (MESSAGE LIMIT REACHED FOR THIS STATEMENT OR ARGUMENT LIST)
21x[514 E] subroutine/function conflict
19x[530 W] possible recursive reference
18x[674 I] procedure, program unit, or entry not referenced
18x[598 E] actual array or character variable shorter than dummy
10x[340 I] equality or inequality comparison of floating point data
8x[325 I] input variable unreferenced
7x[347 I] non-optimal explicit type conversion
7x[565 E] number of arguments inconsistent with specification
6x[582 E] data-type length inconsistent with specification
6x[668 I] possibly undefined: dummy argument not in entry argument list
5x[312 E] no value assigned to this variable
5x[691 I] data-type length inconsistent with specification
4x[556 I] argument unreferenced in statement function
4x[621 I] input/output dummy argument (possibly) not (re)defined
4x[384 I] truncation of character variable (expression)
3x[383 I] truncation of character constant (expression)
3x[343 I] implicit conversion of complex to scalar
3x[454 I] possible recursive I/O attempt
3x[570 E] type inconsistent with specification
2x[568 E] type inconsistent with first occurrence
2x[573 E] data type inconsistent with specification
2x[ 84 I] no path to this statement
2x[651 I] already imported from module
2x[617 I] conditionally referenced argument is not defined
2x[214 E] not saved
2x[236 E] storage allocation conflict due to multiple equivalences
2x[700 E] object undefined
2x[307 E] variable not defined
1x[115 E] multiple definition of statement label, this one ignored
1x[145 I] implicit conversion of scalar to complex
1x[228 W] size of common block inconsistent with first declaration
1x[230 I] list of objects in named COMMON inconsistent with first declaration
1x[250 I] when referencing modules implicit typing is potentially risky
1x[667 E] undefined: dummy argument not in entry argument list
1x[676 I] none of the objects of the common block is used
1x[616 E] input or input/output argument is not defined

number of error messages: 200
number of warnings: 192
number of informative messages: 3415
I've only taken a peek at at a few of these issues in detail. Some of them are nonsense and can be disregarded right off the bat. For instance, the 125 syntax errors? Well, most of them come from the fact that the source files contain the cpp macros __FILE__ or __LINE__ and I haven't figured out yet how to make forcheck expand them (or I haven't worked out the ModelE Makefile magic to get cpp to do it instead).

Looking at the most frequent issues now. The most frequent is casting constant to a higher accuracy. Here's an example from the file QUESDEF.f:
frac1=+1.
where frac1 is defined as a REAL*8. I wouldn't think that casting unknowingly to a higher accuracy would pose much of a problem... but what do I know. Can anyone think of some examples where it would be?

The next most frequent issue is the "not used" issue. The issue here is that a variable has been declared but then never gets used before it goes out of scope. There are two examples of this in the file OCNFUNTAB.f, a module with lookup table functions. From my cursory look, it seems that many of the functions are similarly structured: both in purpose and in terms of documentation and layout. My guess is that in this specific case this evolved from copying an existing function to use as a template for new one, and forgetting to remove the unused declared variable. A code clone. This isn't always the case, of course. In another file with this issue, SNOW.f, it appears to be the result of commenting out code.

This issue is distinguished from the one that is two below it on the list, "variable unreferenced". The variable unreferenced issue refers specifically to when a variable is declared, and a value is set, but the variable is never accessed. This issue also occurs in SNOW.f, in this curious example:
ccc !!! ground properties should be passed as formal parameters !!!
k_ground = 3.4d0 !/* W K-1 m */ /* --??? */
c_ground = 1.d5 !/* J m-3 K-1 */ /* -- ??? */
k_ground is used later in the function, but c_ground never is.

Next up is the "implicit conversion of a real or complex to integer". Here'r a few examples:
QUESDEF.f:     prather_limits = 0.
OCNFUNTAB.f: JS=SS
RADIATION.f: JMO=1+JJDAYS/30.5D0
I, of course, have no idea whether these cases are unintentional, or what the effects of these casts are. I would think that it is generally dangerous to cast unintentionally to a less precise number...

Let's look at one last issue, the cryptic (to me), "lexical token contains non-significant blank(s)". Forcheck describes by saying:
In a fixed format source form blanks are not significant. However, a blank in a name, literal constant, operator, or keyword might indicate a syntax error.
Okay, I still don't get it.. but that's because I'm at all familiar with Fortran (yet). Here's one example from RADIATION.f:
     DATA PRS1/      1.013D 03,9.040D 02,8.050D 02,7.150D 02,6.330D 02,
1 5.590D 02,4.920D 02,4.320D 02,3.780D 02,3.290D 02,2.860D 02,
2 2.470D 02,2.130D 02,1.820D 02,1.560D 02,1.320D 02,1.110D 02,
3 9.370D 01,7.890D 01,6.660D 01,5.650D 01,4.800D 01,4.090D 01,
4 3.500D 01,3.000D 01,2.570D 01,1.220D 01,6.000D 00,3.050D 00,
5 1.590D 00,8.540D-01,5.790D-02,3.000D-04/
Each value, "1.013D 03" for example, is flagged as an instance of this issue. In this particular case these are just ways of writing down Double-precision numbers, but normally (I guess?) you wouldn't see the space, but instead a + or - sign. There might be a forcheck compiler option I can flip that will ignore this particular type of issue. I tried looking for other, possibly more significant instances of this issue. I found many that were "just" because of line continuations. That is, a line was intentionally wrapped and so this appeared as a space at the end of the line and the start of the next. Here are a two examples:
    SEAICE.f:   IF (ROICE.gt.0. and. MSI2.lt.AC2OIM) then
ODIAG_PRT.f: SCALEO(LN_MFLX) = 1.D- 6 / DTS
I had no idea that fortran has such awful syntax for logical expressions. Yes, you read it correctly, the greater-than operator is .gt. and so on. In any case, in the above example in SEAICE.f, the fact that the and-operator is written as ". and." rather than ".and." is what raises this issue. In the ODIAG_PRT.f file, it is the fact that there is a space in the representation of the double-precision number.

This concludes my look at at the forcheck results for now. I should note a few things about how I configured forcheck to actually analyse the results. The most important thing to know is that forcheck can be configured to analyse the syntax according to the quirks of various commercial compilers. For this example I chose to use the Absoft Pro Fortran 90/95 V9 compiler emulation mode because the documentation for ModelE suggested this compiler works well. I had to slightly customise the compiler configuration to enable cpp preprocessing (off by default). I also had to configure forcheck with two cpp "defines" that specified the compiler and target architecture because there were a few places in the source code that had conditional compilation rules.

As mentioned, I didn't analyse the complete source code of the model. Not every module is used for every run configuration. I simply looked at one of the run configuration files that comes with the model, and configured forcheck to analyse only those modules that were specified there. For example, there is a module each for a variety of different resolution configuration: RES_M53.f, RES_M24T.f, etc., as well as several different versions of, what looks like, physics modules: CLOUDS.f and CLOUDS2.f for instance. The run configuration specifics only one resolution module and only one version of the clouds module and so that's what I analysed.

I point all of this out only to be explicit about the limitations of what I'm reporting here. This is just one slice through the code and, as you would expect, the static analysis report is often misleading. Nevertheless, I think this will make for some interesting starting points for discussion about code quality issues.