Visual Query Systems for Databases

Wednesday, December 31, 2008

I've been reviewing the work done on visual querying systems to see what the state of the art is. A visual system for querying climate model datasets might turn out to be a compelling way to enable citizens and scientists to dig through and understand the climate. This idea, minus the visual bit, came up during our brainstorm and looks likely a promising space for a thesis.

First up in my review is the paper "Visual Query Systems for Databases: A Survey" by T. Catarci et al. This paper seems like a good start because it surveys and classifies, from a user's point of view, visual query systems (VQS) used for "traditional databases". Whether or not this applies to climate datasets I don't know, but since I'm familiar with traditional databases it certainly gives me a known point of departure. Apart from classifiying VQSs, this paper also attempts to define various kinds of users and then match them to the classes of VQSs identified earlier and, thankfully, attempt to back up their claims with empirical studies.

At the outset the authors break out visual query systems into those that visually represent the schema and query, and those that visually represent the query results. Many do both but often using different approaches. A variety of VQSs are classified according to what visual representations they use, organised into four broad classes: form-based (tabular), diagram-based, icon-based, and hybrids using a combination. On its own this classification system is unmotivated and lacks utility; the authors do not state any reasons for choosing this classification other than because it differentiates systems by their "the most distinguishing aspect". The classification only becomes relevant later on in the paper when the authors describe the kinds of users who would benefit from them.

Catarci et al. also break out the goals of using a query system into: understanding the database schema and formulating a query. They then classify each of the VQSs according to the various approaches they use to accomplish each of these goals. This is where the paper gets more interesting.

The goal of understanding the database schema -- or "understanding the reality of interest" in their words -- is broken down into three classes of approaches: top-down (i.e. exploring from general to specific parts of the schema); browsing (i.e. exploring a schema, or data, by walking along the relations in the schema); and schema simplification (i.e. incrementally transforming the schema to be closer to user's mental model).

Approaches to formulating a query are classified as: by schema navigation (wandering around a visualisation of the schema, picking out the bits you want to know about); by subqueries (composing partial results of existing queries); by matching (either by supplying an example of the data you want to see (query-by-example), or by supplying a pattern to match results to); and by range selection (adjusting the range of values for a search condition on a preset query -- think dynamic queries).

The second-to-last section (section 6) contains the most interesting part of this paper. In this section the authors discuss the usability of VQSs for different kinds of users. Right from the start they state that casual users will benefit most from visual query systems. Then they break down casual in various ways: by computer training (the focus is on non-professionals), frequency of interaction, variance of queries, complexity of queries; and familiarity with the domain. Each visual representation type (form-based, diagram-based, etc..) from earlier in the paper is discussed and classified in relation to the various user aspects. For instance, from the analysis we can conclude such things as: users unfamiliar with the domain are best served by icon-based systems since visual metaphor of the icons are immediately understandable. Familiar users are best served by diagram-based systems because they can articulate more nuanced query concepts.

The analysis in this section is not incredibly thorough or principled. I suppose it does give us somewhat of a framework for understanding the usability of VQSs, though. We just don't know if it's valid, is all. The end of this section surveys the empirical usability studies done on VQSs, from which only the authors find only partially support for their previous classifications.

So, overall, this paper does do what it sets out to do: it surveys many different visual query systems and chops them up in all sorts of classifications, from the type of visual representations they use to the sorts of users they best serve. It's just not clear those classifications are meaningful. Of course, they are intuitive so the classifications ought to make this paper useful as a reference or as an introduction to the field. This paper suggests to me that VQS usability isn't well understood (or wasn't in 12 years ago). We'll see how this observation holds up after I review a few other papers .

Brainstorm: Software Engineering and Climate Change

Monday, December 15, 2008

A few of us in the software engineering group have met over the past few weeks to brainstorm software engineering research ideas for dealing with climate change. We are attempting to refine our ideas into research questions and projects, and this is what we've come up with so far.

Better understanding of climate processes

The general public doesn't have a such a great understanding of some of the basic climate processes (Sterman & Sweeney, 2007). So, what about creating an environment to learn these sorts of things in? This is, of course, directly connected to raising awareness of climate change and enabling people to make better decisions (see below).
  • Simplified Climate Model: create a really simple climate model (even simpler than JCM[1]) that an 8 year old would want to use to explore various climate scenarios.
  • SimClimate: like SimCity, create a game that's played out in a world with a simulated climate and players learn about climate process implicitly through playing. Maybe you have to play out various scenarios, or maybe it's a massively multiplayer game where each player controls a region and has to work with others in the game. There are several existing examples of this sort of thing.
  • Climate laboratory: let users build simple climate models themselves, or tinker with existing models, and then run experiments on them. The focus with this idea is to get users to understand climate processes by directly manipulating them, rather than just tweaking various settings.

Raising awareness and making better decisions

The more information you have, the better decisions you can make...
  • Wiki decision: a wiki-like database linking decisions and an explanation of the impacts of those decisions. For instance, in choosing a particular Ontario apple v.s. a New Zealand apple, you might be able to follow the chain of events in the production, transportation and distribution. Maybe expert opinions are included or figures for fuel consumption.
  • Barcode Reader: for any product, a hand-held reader that provides all sorts of related environmental information. E.g. food miles, carbon footprint, pesticide use, etc... like current Nutritional Information given on packages.
My bias: I'm really wary of ideas like these that may increase obsessive, paralyzing worrying about making the best decision. Sometimes more information is not what's needed. I'd want plenty of care and forethought to go into crafting such tools.

Access to climate data and resources

What can we do to provide both the public and scientists with easier and more useful access and analysis of existing climate data (both observations and model predictions)?
  • WikiClimateStats. In the same vein as StatsJam (watch the screencast), give people a method to easily explore and analyse climate data, and have discussions about it. As a tool for scientists we'd want this to help them share results and analysis (i.e. furthering open science).
Some questions:
  • What questions are climate scientists asking about their data?
  • What questions are the public asking? What questions would it be helpful to be able to ask? What does 'helpful' mean here?
  • What sort of query interface makes sense (SQL won't cut it for non-programmers)?
For climate modellers, sharing their "digital resources" is still troublesome:
  • No standards yet for model metadata, configuration data, experiment output, etc..
  • Data visualisation
  • No way to discover existing climate modules, or to see if existing runs satisfy your questions
  • Access rights (data isn't always public)
See the Earth System Curator wiki page on Use Cases for some ideas on what's needed.

Collaboration and coordination amongst climate modelling groups

Climate modelling groups don't often collaborate or coordinate with other modelling groups, and when they do it's done face-to-face rather than in a distributed fashion. For really complex climate models, development may need to go distributed. How can we assist?
  • Integrate awareness tools. Get scientists to use existing collaboration and awareness tools and see if it improves things.
  • What are scientists asking about their code?
  • How can we help scientists to share tacit knowledge?
  • Allow access to the decision making of other modelling groups.

Software Engineering challenges in building climate models

We know a little bit about this (see here).
  • Can we borrow software engineering practices from other disciplines, or from commercial outfits?
  • What are questions scientists asking about their code?
  • Where do they get this information currently?
  • Why is it hard to modularise their models? (As mentioned before, it's tough)
Ideas:
  • Build better FORTRAN tools. FORTRAN is the language of choice for scientific computing but there doesn't seem to be the same level of sophistication of tools for it as for "modern" languages. We could build better a better IDE, debugger, refactoring tools, static analysis tools, etc...
  • Try standard tool-based code exploration. Would scientists write better code (or write it with less trouble) if they had access to some standard code exploration techniques. For example, profiling, code metrics, or reverse engineering tool analysis?
  • Teach 'em the basics. See Software Carpentry. If we teach scientists the basics of crafting software, does it help?

Validation and Verification

  • Document current methods of validation and verification.
  • Link climate model V&V to standard software engineering V&V
  • Link climate model V&V to philosophy of science.
  • Link to their software development practices and the choice of approximations used in the model.
  • Investigate current model benchmarks. What is a better benchmark?
  • Traceability (the ability to trace results from the simplifications of a model to their details in a more comprehensive model)
  • Seamless Assessment (the ability to vary resolution and time scales)

[1] Do me a favour: Start up JCM, click the 'mitigation' button/dropdown in the far right corner and choose "stabilise CO2 emissions", and then try to stop the global mean temperature from rising (bottom right chart) by playing with the CO2 emissions (top left chart). (Let your mouse hover over the CO2 emissions handle to get a read out.) Speak to me when you're no longer scared.

Why I'm here

Sunday, November 30, 2008

All through this past summer I've had my mind turned to the task of sorting out my future at the university. Do I return, and what direction would my research take? And throughout the summer, and long before I left school, I was stuck answering this question. On the one hand I was obviously interested in both the sorts of research questions that had come up over the year, and the freedom I was given in graduate school to explore them. On the other hand, I felt an immense pressure to make good use of my time and I was uneasy with how indirect or ineffectual the various research projects we had come up with seemed to be.

Here's my thinking. There are big troubles in the world. Big, planet-ending sorts of troubles and big species-ending troubles and big people-ending troubles. News of these troubles rakes at my brain and makes me angry and sad and upset. Responding to these troubles seems urgent. Not responding to them seems terribly sad. And so, I try to steer my life so I can be of some benefit to the world and respond to the troubles in it.

Of course, I'm a complex person and I'm not only interested in responding to these sorts of things. I'm also interested in extensible programming systems, how to capture data lineage in software workflows, and how different each millimeter feels as I roll my shoulders in the setu bandha asana, for example.

And so, I've been tugged between following interests of mine which are indirectly related to responding to urgent world issues (sure, if I'm doing yoga all the time I'll probably be a friendlier person to be around, and therefore of more benefit to the world) and doing things that are more directly related (growing food, for instance). I was at an impasse at the end of the winter term last year, so I left.

Over the past week I've given all of this some thought again. I've decided that the distinction between direct and indirect action is blurry at best, but what's clear is my intent. I've decided that even if my master's work doesn't have direct impact it doesn't preclude me from doing other work that does whilst I work on it. I've also decided that since I'm interested I might as well follow my nose since, like all academic research, it may have some unforseable impact in the future, or it may turn me on to more relevant research or work, or it just might equipment to better engage with the world. (Whatever that means.)

I'll also get the chance to inspire the people I work with by telling them idyllic stories of farm life; I'll be able to bring back some of the academic world to my hill billy farming friends; and I'll have three new classy letters to add to end of my name.

And that's why I'm here.

Stay tuned for more details on where my interests are taking me.

A journal

Wednesday, July 30, 2008

Phew. It's been awhile, huh? I haven't even posted on my farm blog for ages either.

In any case, I've been doing lots of thinking whilst I'm working. And recently I've started looking for journals to read, and talking with folks about a few ideas I've had.

Hopefully I'll post more about all of this soon, but I wanted to share this link right now. It's to a journal that looks promising in that it mixes my interests in computer science and agriculture. See Computers and Electronics in Agriculture.