D. E. Stevenson, "A critical look at quality in large-scale simulations," Computing in Science & Engineering, vol. 1, no. 3, pp. 53-63, 1999.Here's a wandering article that has lots of great thoughts but never seems to pull them together tightly enough for me to come up with any solid, unified take-aways. Stevenson sets out to describe and tackle the friction, and danger, created by the differing ideas of simulation quality by management and scientists. He's coming at this problem from the perspective of
ASCI (
Accelerated Scientific Computing Initiative) science -- that is, the
folks charged with "predicting, with confidence, the behaviour of nuclear weapons through comprehensive, science-based simulations." After spending some time discussing the disconnect in the understanding of simulation quality and the resulting problems that this disconnect creates, Stevenson takes a step back and looks at the the general modelling and simulation endeavour itself. He explores what modelling and simulation is, why we do it and what we should hope to gain from doing it, and how these two things ought to inform our notion of quality. He also provides us with some observations on why building high-quality simulations is probably difficult, discusses what validation and verification is, and distinguishes between two types of quality. Phew.
Here's Stevenson's summary of the article:
"Software engineering is meant to produce software by a manufacturing paradigm, but this paradigm simply cannot deal with the scientific issues. This article examines the successes and failures of software engineering. I conclude that process does not develop software, people and their tools do. Second, software metrics are not meaningful when the software's purpose is to guarantee the world's safety in the nuclear era. Finally, the quality of simulations must be based on the quality of insights gained from the revealed science."
Okay, some key points to mention. On the topic of what modelling and V&V is, Stevenson introduces some clear terminology. He defines validation and verification in terms of three different types of systems:
observational (the world out there -- e.g. the climate),
theoretical (our model/theory of the workings of the world -- e.g. the equations that describe climate processes), and
calculational (e.g. the implementation of the theoretical model -- the climate model code).
Validation checks that the theoretical system properly explains the observational system, and
verification checks that the
calculational system correctly implements the theoretical system. Stevenson then uses the term
validation in a broader sense, stating that "complete validation of the observational-theoretical-
calculational systems requires that we compute the right numbers for the right reasons."
On the nature of quality, Stevenson points out that our reasons for modelling need to inform our notions of quality; and that the divide between management and science occurs because this isn't happening. We model, and validate those models, in order to gain insight into the nature of whatever it is that we're modelling (i.e. the observational system). Insight is the essential purpose of science, and simulations are just tools to gain insight. Insight and modelling are the products of science. But, from a manufacturing perspective (read: a management perspective), insight isn't essential for building a model and a simulation. A model can just be seen as a specification (not as a product itself), and a simulation as a final product. Thus, from an from an engineering management position, validation and insight take a back seat -- the real problem is one of manufacturing. And so scientists and management are looking at the same process at cross-purposes.
If insight is the end goal of simulation computing then the quality of computing can be measured by the quality of insight. Since insight leads to knowledge, we can judge the quality of insight by the quality of knowledge we get from a project. How do we judge the quality of knowledge we get? Well... frankly, this is were I lose track of the article a bit. Either it's because I'm just dense, Stevenson is intentionally vague, and/or he's put the real content in another paper of his,
D. E. Stevenson, "Science, computational science, and computer science: at a crossroads," in CSC '93: Proceedings of the 1993 ACM conference on Computer science. New York, NY, USA: ACM Press, 1993, pp. 7-14.What he does say is that whilst we know what scientific and mathematical knowledge looks like (inductive and deductive, respectively), we don't really know what knowledge from computer science looks like. He references a few "principles" of computing knowledge from paper I just mentioned: physical exactness (elimination of
parameterisations), computability, and bounded errors (
a priori or
postepriori error estimates). I'll have to read that paper before I can say much more about that...
Stevenson goes on to describe two kinds of quality, intrinsic and internal. I have to say I'm not quite sure I understand the distinction very well. Here's my take. Stevenson defines intrinsic quality as "the sum total of our faith in the system of models and machines." He says about internal quality, "each dimension [of insight and knowledge we receive from the simulation?], such as the mathematics or the physics, has its own idea of internal quality."
I think what he's doing here is making a distinction in quality that's analogous to the distinction between verification and validation in that intrinsic quality applies to the match between observational and theoretical systems, and internal quality applies to theoretical and
calculational systems. Intrinsic quality is an epistemological notion of a good modelling endeavour. It is what we're talking about when we ask: regardless of any possible implementation, what needs to be present in any model and implementation for the modelling effort to be a good one in terms of getting us insight and knowledge?
Internal quality looks at the quality issue from the other side. It assumes (or disregards) intrinsic quality, and focuses just on how good our model and implementation is in terms of the kinds of knowledge we have already. For a mathematician or scientist in general, internal quality may relate to the simplicity or elegance of the model. For a computer scientist or engineer, internal quality may relate to the simplicity or robustness of the code.
Stevenson's point is, I think, that ultimately we computer scientists don't have a clear justification for our measures of quality. If insight and knowledge is the end goal of modelling, we need to have a clear sense of intrinsic quality in our endeavour. Then we need to use this understanding to inform our measures of internal quality. Otherwise we're just measuring things because we can, not because they show us the way to better science.