Visual Query Systems for Databases

Wednesday, December 31, 2008

I've been reviewing the work done on visual querying systems to see what the state of the art is. A visual system for querying climate model datasets might turn out to be a compelling way to enable citizens and scientists to dig through and understand the climate. This idea, minus the visual bit, came up during our brainstorm and looks likely a promising space for a thesis.

First up in my review is the paper "Visual Query Systems for Databases: A Survey" by T. Catarci et al. This paper seems like a good start because it surveys and classifies, from a user's point of view, visual query systems (VQS) used for "traditional databases". Whether or not this applies to climate datasets I don't know, but since I'm familiar with traditional databases it certainly gives me a known point of departure. Apart from classifiying VQSs, this paper also attempts to define various kinds of users and then match them to the classes of VQSs identified earlier and, thankfully, attempt to back up their claims with empirical studies.

At the outset the authors break out visual query systems into those that visually represent the schema and query, and those that visually represent the query results. Many do both but often using different approaches. A variety of VQSs are classified according to what visual representations they use, organised into four broad classes: form-based (tabular), diagram-based, icon-based, and hybrids using a combination. On its own this classification system is unmotivated and lacks utility; the authors do not state any reasons for choosing this classification other than because it differentiates systems by their "the most distinguishing aspect". The classification only becomes relevant later on in the paper when the authors describe the kinds of users who would benefit from them.

Catarci et al. also break out the goals of using a query system into: understanding the database schema and formulating a query. They then classify each of the VQSs according to the various approaches they use to accomplish each of these goals. This is where the paper gets more interesting.

The goal of understanding the database schema -- or "understanding the reality of interest" in their words -- is broken down into three classes of approaches: top-down (i.e. exploring from general to specific parts of the schema); browsing (i.e. exploring a schema, or data, by walking along the relations in the schema); and schema simplification (i.e. incrementally transforming the schema to be closer to user's mental model).

Approaches to formulating a query are classified as: by schema navigation (wandering around a visualisation of the schema, picking out the bits you want to know about); by subqueries (composing partial results of existing queries); by matching (either by supplying an example of the data you want to see (query-by-example), or by supplying a pattern to match results to); and by range selection (adjusting the range of values for a search condition on a preset query -- think dynamic queries).

The second-to-last section (section 6) contains the most interesting part of this paper. In this section the authors discuss the usability of VQSs for different kinds of users. Right from the start they state that casual users will benefit most from visual query systems. Then they break down casual in various ways: by computer training (the focus is on non-professionals), frequency of interaction, variance of queries, complexity of queries; and familiarity with the domain. Each visual representation type (form-based, diagram-based, etc..) from earlier in the paper is discussed and classified in relation to the various user aspects. For instance, from the analysis we can conclude such things as: users unfamiliar with the domain are best served by icon-based systems since visual metaphor of the icons are immediately understandable. Familiar users are best served by diagram-based systems because they can articulate more nuanced query concepts.

The analysis in this section is not incredibly thorough or principled. I suppose it does give us somewhat of a framework for understanding the usability of VQSs, though. We just don't know if it's valid, is all. The end of this section surveys the empirical usability studies done on VQSs, from which only the authors find only partially support for their previous classifications.

So, overall, this paper does do what it sets out to do: it surveys many different visual query systems and chops them up in all sorts of classifications, from the type of visual representations they use to the sorts of users they best serve. It's just not clear those classifications are meaningful. Of course, they are intuitive so the classifications ought to make this paper useful as a reference or as an introduction to the field. This paper suggests to me that VQS usability isn't well understood (or wasn't in 12 years ago). We'll see how this observation holds up after I review a few other papers .

No comments:

Post a Comment