So here's a new topic. Whilst spending last summer at the Hadley Centre, Steve made the preliminary observation that the defect density of Hadley Centre's climate model appears to be surprisingly lower than defect rates for comparably-sized projects. Does this observation hold up under scrutiny? What if we control carefully for project size and user and developer base size? If we were to compare the kinds of defects found in the other projects to those found in the Hadley GCM, surely we'd find that there are classes of defects that are rarely considered defects by the Hadley scientists (e.g. superficial bugs, like GUI defects, for instance). So what exactly do scientists consider as defects? Can these be characterised? If we only compared the defect density between projects over similar classes of defects, do we still see the lower defect rate in the climate model? How do other GCMs or climate models compare?
Regardless of the outcome of a more rigorous look at climate model defects, there are larger questions of software quality to explore. Namely, what is the underlying cause for the differences in defect density (whether the GCM defect density is, after all, better, worse, or comparable)? If the defect density is lower for climate models, one hypothesis may be that it's a result of the fact that climate scientists are both the users and developers of their software, so maybe they are more likely to catch defects early on. But then, we'd expect to see similar defect density patterns in open source software. Another hypothesis is that climate models are inherently more "resistant" to defects because of the powerful constraints put on them by the physical systems they simulate (e.g. conservation of mass, and energy) and the extreme numerical sensitivity of the models. Or maybe the folks at Hadley have a great software engineering process that others need to learn from.
Thoughts? What am I missing?
There is a mountain of literature on the nature of scientific software quality, defect density, and related topics. I'm just starting into it now. Here's a glimpse into where I am:
- L. Hatton, "The t experiments: errors in scientific software," Computational Science & Engineering, IEEE, vol. 4, no. 2, pp. 27-38, 1997.
- L. Hatton and A. Roberts, "How accurate is scientific software?" Software Engineering, IEEE Transactions on, vol. 20, no. 10, pp. 785-797, 1994.
- L. Hatton, "Reexamining the fault density component size connection," Software, IEEE, vol. 14, no. 2, pp. 89-97, 1997.
- and many more by Les Hatton
- T. Zimmermann, R. Premraj, and A. Zeller, "Predicting defects for eclipse," in PROMISE '07: Proceedings of the Third International Workshop on Predictor Models in Software Engineering. Washington, DC, USA: IEEE Computer Society, 2007, pp. 9+.
- D. E. Stevenson, "A critical look at quality in large-scale simulations," Computing in Science & Engineering, vol. 1, no. 3, pp. 53-63, 1999.
- S. Shackley, P. Young, S. Parkinson, and B. Wynne, "Uncertainty, complexity and concepts of good science in climate change modelling: Are gcms the best tools?" Climatic Change, vol. 38, no. 2, pp. 159-205, February 1998.
No comments:
Post a Comment