- It will force me to get my hands into the code and bug reports. My hope is that even a basic familiarity with these things will help me understand issues of quality for computational scientists. It may also give me a bit of currency in discussions with scientists if I have some understanding of the details.
- The results may be useful to the individual climate modelling groups, as a gauge for quality within their group.
- Doing the study furthers a dialogue between computer scientists and computational scientists and climate modelling groups.
- I might end up with something useful!
- As I say in point #1, I might actually gain some insight about computational science software quality. ;-)
- Aside from comparing defect densities, I might find another ways to use this data for benchmarking. For instance, at the workshop for software quality at ICSE, Elmar Juergens spoke about how in judging quality absolute values suck (that might been exactly what he said), and how trend analysis is much better. He was speaking from the point of view of process improvement. But this raises an interesting idea: if we redefine software quality as "a good software development process" (whatever that means) maybe we could use aspects of quality trends as points of comparisons between projects.
- As I say in point #1, I might actually gain some insight about computational science software quality. ;-)
The way that the folks in [1] count defects is simply a matter of counting bug reports, or counting the number of check-in comments that say "fixed", "bug" (or other keywords that suggest a fix for a bug). Some papers count defects before a release is made (pre-release) and others count defects against a release (post-release). What makes a bug pre- or post-release is a matter of opinion: some papers go by how it's marked in the bug database, others set a threshold of days before and after a release date with which to categorise bugs. Some papers explicitly mention that defects are counted only if they have been fixed (i.e. just reporting a defect isn't enough) whereas other papers aren't clear about this. Finally, some papers only consider defects logged against certain areas of the software as worth counting (for instance, an installation problem may not be counted but a UI problem would be). Phew.
I'm sure there are more dimensions I haven't considered!
[1] A sampling: Koru et al., 2007; Fenton & Ohlsson, 2000; Kaaniche & Kanoun, 1996; Zimmerman et al., 2007
No comments:
Post a Comment