Jon Pipitone

A collaborator, a story, something delicious

Wednesday, January 7, 2009

I met with Greg and Alecia this morning. Alecia, as you can see from her blog, is working to make web mapping accessible. That means providing all the information a sighted person can infer from a map to users regardless of their ability to see or use a mouse. Since there is an enormous amount of implicit information in a map (e.g. quick, what's the closest major centre to Winnipeg?), this problem is tough. I'm sure this isn't a complete description of the problem, but that's the idea.

There's overlap with my interests in that we're both looking at representing spatio-temporal data in interesting ways. Well, it'll sure be nice to have another person to bounce ideas off of.

But what, exactly, am I working on these days? This morning I bumbled my way through an explanation to Alecia. Greg pulled it together into a coherent and sexy story. I'm going to give you my own retelling of it:

Climate change -- the science is out there. There are plenty of datasets and models to show us how the climate behaves and how it's likely to behave in various future scenarios. And if you aren't truly scared shitless, then you're not reading the same books and websites I am. Or maybe it's because you don't have a compelling way to make sense of the science, to understand it and its implications on your life and future.

My hope is that I can do something about this. I'd like to make, or find and refine, something to that ("even") a high school student could, and would want to, interact with to help make sense of our predicament.

That's where the interest in spatio-temporal visualisation comes in. Maybe what's needed is a nifty environment for exploring existing data, or a simplified climate model, or .... well, I dunno yet.

For the next few days I'm going to continue my exploration of visualisations, but I'm going to widen my search beyond academic papers necessarily. I'll probably be turning up more than I can blog about, so I've created a del.icio.us account which you can follow, or check out some of the existing relevant del.icio.us bookmarks.

2 1/2 papers on engineering climate models

Tuesday, January 6, 2009

Here are two papers (plus another that goes into more depth) on the software engineering of the Community Climate System Model, and the Earth Systems Modelling Framework. I'm just starting to get into these projects, and these papers have been very helpful so far:

Overview of the Software Design of the Community Climate System Model by J. Drake et. al.
This paper describes the dreamy world of the CCSM project, a seemingly well-oiled software engineering enterprise. This paper covers the design of the CCSM, plus an overview of the software development process.

The Architecture of the Earth System Modeling Framework by C. Hill et al.
This is a larger project that incorporates the CCSM. This paper just outlines the design (no process). For more details see: Design and Implementation of Components in the Earth System Modeling Framework by N. Collins et al.

Extensible programming

Monday, January 5, 2009

A couple of years ago I did some extra-curricular work with Greg Wilson and Miles Thibault related to the idea of extensible programming. As a warm up, I built a scheme interpreter and implemented hygenic macros from scratch. Learned heaps. Two of Greg's students (golly, can't remember their names) appear to be taking up the extensible programming idea in earnest as their master's projects. For what it's worth, I'll summarise two of the ideas that came out of the project:

You can't have an extensible language without an extensible parser. Stating it this way makes it seem pretty obvious, but when we moved from scheme to Java this was a bit of a revelation. In Scheme you can extend the language anyway you want simply with a macro, but that's only because the language has an extremely constrained syntax. For Java you'd either have to have predefined grammatical constructs that extensions would fit into (like the JSE macro syntax), or have your language extensions also extend the parser to support themselves.
Fully qualified language keywords. One option I came up with to support grammer extensions was the notion of a fully qualified language keyword. In the same way that Java lets you fully qualify classes (e.g. org.w3c.Document) why not fully qualify keywords to allow the parser to handle ambiguous bits of syntax. For example:
import syntax com.pipitone.try;

...

java.lang.syntax.try {// standard try-catch
...
}
catch (FooException e) {...}
catch (BarException e2) {...}

com.pipitone.try {// custom, multi-exception try-catch
...
}
catch (FooException e1, BarException e2) { ... }

Interesting stuff, this. It's still exciting. If I wasn't trying to save the world I'd probably being working on this project. ;-)

Exploratory spatio-temporal visualisation

The last paper I reviewed discussed and evaluated different visual querying techniques for general relational databases. Today's paper, Exploratory spatio-temporal visualisation: an analytical review by N. Andrienko et al., narrows the discussion to just spatio-temporal data and focuses less on querying specifically than on evaluating a variety of exploratory techniques specific to spatio-temporal data.

The paper begins with a long discussion intent on classifying the kinds of questions one can ask of spatio-temporal data. The authors point to spatio-temporal data having three major dimensions: what, where, and when, and that a classification system natural falls out from this observation. Questions about the data are usually posed about one dimension given the other two (e.g. when did x happen at y?) with answers needed at various levels of aggregation (e.g. at specific points, over a range of points, or overall; they refer to this as the "search level") and with various types of comparisons/relations done on the results (they're rather vague about this bit). At the end of their discussion the authors decide to explore only those questions that focus on the time dimension: given a time (or range) what happened where?; or when did what happen where. Here's their handy graphical representation of all of this:

The paper then turns to briefly surveying existing exploratory techniques: querying (they focus on dynamic querying and filtering), map animation, and other visualisations to explore changes in locations, events, or attributes.

With these techniques in hand the authors get to the real meat of the paper and evaluate how each technique serves to answer the two general kinds of questions. In the process of doing so they to break down the two question types into more specific questions types, detailing various search levels and cognitive operations. For instance, an elementary when -> what + where question might involve comparing behaviours over the same time interval (e.g. "compare the movements of stork X and Y") or at distinct intervals (e.g. "compare the migration behaviours of the stork X in the years 2001 and 2002"). Each of these question subtypes are explained and matched to appropriate exploratory techniques, often with references to existing implementations.

This paper is another (seemingly b/c what do I know?) decent survey of existing software and techniques. It presents a slightly more principled classification (the question types) and evaluation criteria but is fairly light-weight on the analysis (it all seems rather ad-hoc), and provides no empirical backup for many claims.

There is one thing I can take away from this paper: that visual querying is more than just a visual represention of the query question. It's also exploration. This paper implicitly assumes this with its highlighting of techniques which blur the visualisation of the query and the results, à la dynamic queries.

The other take away is dead obvious but I'll state it anyway: the type of questions you're asking tell you (or contstrain, or inform) what exploratory/query techniques to use. Creating a novel SQL query visualiser might not be what climate scientists or grade 10 students need to answer their types of questions. It's worth mentioning all of this only because it highlights an important point for me: I need to get a handle on what questions folks can't answer easily (or aren't asking, but ought to).

Visual Query Systems for Databases

Wednesday, December 31, 2008

I've been reviewing the work done on visual querying systems to see what the state of the art is. A visual system for querying climate model datasets might turn out to be a compelling way to enable citizens and scientists to dig through and understand the climate. This idea, minus the visual bit, came up during our brainstorm and looks likely a promising space for a thesis.

First up in my review is the paper "Visual Query Systems for Databases: A Survey" by T. Catarci et al. This paper seems like a good start because it surveys and classifies, from a user's point of view, visual query systems (VQS) used for "traditional databases". Whether or not this applies to climate datasets I don't know, but since I'm familiar with traditional databases it certainly gives me a known point of departure. Apart from classifiying VQSs, this paper also attempts to define various kinds of users and then match them to the classes of VQSs identified earlier and, thankfully, attempt to back up their claims with empirical studies.

At the outset the authors break out visual query systems into those that visually represent the schema and query, and those that visually represent the query results. Many do both but often using different approaches. A variety of VQSs are classified according to what visual representations they use, organised into four broad classes: form-based (tabular), diagram-based, icon-based, and hybrids using a combination. On its own this classification system is unmotivated and lacks utility; the authors do not state any reasons for choosing this classification other than because it differentiates systems by their "the most distinguishing aspect". The classification only becomes relevant later on in the paper when the authors describe the kinds of users who would benefit from them.

Catarci et al. also break out the goals of using a query system into: understanding the database schema and formulating a query. They then classify each of the VQSs according to the various approaches they use to accomplish each of these goals. This is where the paper gets more interesting.

The goal of understanding the database schema -- or "understanding the reality of interest" in their words -- is broken down into three classes of approaches: top-down (i.e. exploring from general to specific parts of the schema); browsing (i.e. exploring a schema, or data, by walking along the relations in the schema); and schema simplification (i.e. incrementally transforming the schema to be closer to user's mental model).

Approaches to formulating a query are classified as: by schema navigation (wandering around a visualisation of the schema, picking out the bits you want to know about); by subqueries (composing partial results of existing queries); by matching (either by supplying an example of the data you want to see (query-by-example), or by supplying a pattern to match results to); and by range selection (adjusting the range of values for a search condition on a preset query -- think dynamic queries).

The second-to-last section (section 6) contains the most interesting part of this paper. In this section the authors discuss the usability of VQSs for different kinds of users. Right from the start they state that casual users will benefit most from visual query systems. Then they break down casual in various ways: by computer training (the focus is on non-professionals), frequency of interaction, variance of queries, complexity of queries; and familiarity with the domain. Each visual representation type (form-based, diagram-based, etc..) from earlier in the paper is discussed and classified in relation to the various user aspects. For instance, from the analysis we can conclude such things as: users unfamiliar with the domain are best served by icon-based systems since visual metaphor of the icons are immediately understandable. Familiar users are best served by diagram-based systems because they can articulate more nuanced query concepts.

The analysis in this section is not incredibly thorough or principled. I suppose it does give us somewhat of a framework for understanding the usability of VQSs, though. We just don't know if it's valid, is all. The end of this section surveys the empirical usability studies done on VQSs, from which only the authors find only partially support for their previous classifications.

So, overall, this paper does do what it sets out to do: it surveys many different visual query systems and chops them up in all sorts of classifications, from the type of visual representations they use to the sorts of users they best serve. It's just not clear those classifications are meaningful. Of course, they are intuitive so the classifications ought to make this paper useful as a reference or as an introduction to the field. This paper suggests to me that VQS usability isn't well understood (or wasn't in 12 years ago). We'll see how this observation holds up after I review a few other papers .

Brainstorm: Software Engineering and Climate Change

Monday, December 15, 2008

A few of us in the software engineering group have met over the past few weeks to brainstorm software engineering research ideas for dealing with climate change. We are attempting to refine our ideas into research questions and projects, and this is what we've come up with so far.

Better understanding of climate processes

The general public doesn't have a such a great understanding of some of the basic climate processes (Sterman & Sweeney, 2007). So, what about creating an environment to learn these sorts of things in? This is, of course, directly connected to raising awareness of climate change and enabling people to make better decisions (see below).

Simplified Climate Model: create a really simple climate model (even simpler than JCM [1]) that an 8 year old would want to use to explore various climate scenarios.
SimClimate: like SimCity, create a game that's played out in a world with a simulated climate and players learn about climate process implicitly through playing. Maybe you have to play out various scenarios, or maybe it's a massively multiplayer game where each player controls a region and has to work with others in the game. There are several existing examples of this sort of thing.
Climate laboratory: let users build simple climate models themselves, or tinker with existing models, and then run experiments on them. The focus with this idea is to get users to understand climate processes by directly manipulating them, rather than just tweaking various settings.

Raising awareness and making better decisions

The more information you have, the better decisions you can make...

Wiki decision: a wiki-like database linking decisions and an explanation of the impacts of those decisions. For instance, in choosing a particular Ontario apple v.s. a New Zealand apple, you might be able to follow the chain of events in the production, transportation and distribution. Maybe expert opinions are included or figures for fuel consumption.
Barcode Reader: for any product, a hand-held reader that provides all sorts of related environmental information. E.g. food miles, carbon footprint, pesticide use, etc... like current Nutritional Information given on packages.

My bias: I'm really wary of ideas like these that may increase obsessive, paralyzing worrying about making the best decision. Sometimes more information is not what's needed. I'd want plenty of care and forethought to go into crafting such tools.

Access to climate data and resources

What can we do to provide both the public and scientists with easier and more useful access and analysis of existing climate data (both observations and model predictions)?

WikiClimateStats. In the same vein as StatsJam (watch the screencast), give people a method to easily explore and analyse climate data, and have discussions about it. As a tool for scientists we'd want this to help them share results and analysis (i.e. furthering open science).

Some questions:

What questions are climate scientists asking about their data?
What questions are the public asking? What questions would it be helpful to be able to ask? What does 'helpful' mean here?
What sort of query interface makes sense (SQL won't cut it for non-programmers)?

For climate modellers, sharing their "digital resources" is still troublesome:

No standards yet for model metadata, configuration data, experiment output, etc..
Data visualisation
No way to discover existing climate modules, or to see if existing runs satisfy your questions
Access rights (data isn't always public)

See the Earth System Curator wiki page on Use Cases for some ideas on what's needed.

Collaboration and coordination amongst climate modelling groups

Climate modelling groups don't often collaborate or coordinate with other modelling groups, and when they do it's done face-to-face rather than in a distributed fashion. For really complex climate models, development may need to go distributed. How can we assist?

Integrate awareness tools. Get scientists to use existing collaboration and awareness tools and see if it improves things.
What are scientists asking about their code?
How can we help scientists to share tacit knowledge?
Allow access to the decision making of other modelling groups.

Software Engineering challenges in building climate models

We know a little bit about this (see here).

Can we borrow software engineering practices from other disciplines, or from commercial outfits?
What are questions scientists asking about their code?
Where do they get this information currently?
Why is it hard to modularise their models? (As mentioned before, it's tough)

Ideas:

Build better FORTRAN tools. FORTRAN is the language of choice for scientific computing but there doesn't seem to be the same level of sophistication of tools for it as for "modern" languages. We could build better a better IDE, debugger, refactoring tools, static analysis tools, etc...
Try standard tool-based code exploration. Would scientists write better code (or write it with less trouble) if they had access to some standard code exploration techniques. For example, profiling, code metrics, or reverse engineering tool analysis?
Teach 'em the basics. See Software Carpentry. If we teach scientists the basics of crafting software, does it help?

Validation and Verification

Document current methods of validation and verification.
Link climate model V&V to standard software engineering V&V
Link climate model V&V to philosophy of science.
Link to their software development practices and the choice of approximations used in the model.
Investigate current model benchmarks. What is a better benchmark?
Traceability (the ability to trace results from the simplifications of a model to their details in a more comprehensive model)
Seamless Assessment (the ability to vary resolution and time scales)

[1] Do me a favour: Start up JCM, click the 'mitigation' button/dropdown in the far right corner and choose "stabilise CO2 emissions", and then try to stop the global mean temperature from rising (bottom right chart) by playing with the CO2 emissions (top left chart). (Let your mouse hover over the CO2 emissions handle to get a read out.) Speak to me when you're no longer scared.

Why I'm here

Sunday, November 30, 2008

All through this past summer I've had my mind turned to the task of sorting out my future at the university. Do I return, and what direction would my research take? And throughout the summer, and long before I left school, I was stuck answering this question. On the one hand I was obviously interested in both the sorts of research questions that had come up over the year, and the freedom I was given in graduate school to explore them. On the other hand, I felt an immense pressure to make good use of my time and I was uneasy with how indirect or ineffectual the various research projects we had come up with seemed to be.

Here's my thinking. There are big troubles in the world. Big, planet-ending sorts of troubles and big species-ending troubles and big people-ending troubles. News of these troubles rakes at my brain and makes me angry and sad and upset. Responding to these troubles seems urgent. Not responding to them seems terribly sad. And so, I try to steer my life so I can be of some benefit to the world and respond to the troubles in it.

Of course, I'm a complex person and I'm not only interested in responding to these sorts of things. I'm also interested in extensible programming systems, how to capture data lineage in software workflows, and how different each millimeter feels as I roll my shoulders in the setu bandha asana, for example.

And so, I've been tugged between following interests of mine which are indirectly related to responding to urgent world issues (sure, if I'm doing yoga all the time I'll probably be a friendlier person to be around, and therefore of more benefit to the world) and doing things that are more directly related (growing food, for instance). I was at an impasse at the end of the winter term last year, so I left.

Over the past week I've given all of this some thought again. I've decided that the distinction between direct and indirect action is blurry at best, but what's clear is my intent. I've decided that even if my master's work doesn't have direct impact it doesn't preclude me from doing other work that does whilst I work on it. I've also decided that since I'm interested I might as well follow my nose since, like all academic research, it may have some unforseable impact in the future, or it may turn me on to more relevant research or work, or it just might equipment to better engage with the world. (Whatever that means.)

I'll also get the chance to inspire the people I work with by telling them idyllic stories of farm life; I'll be able to bring back some of the academic world to my hill billy farming friends; and I'll have three new classy letters to add to end of my name.

And that's why I'm here.

Stay tuned for more details on where my interests are taking me.