The Clear Browser: Visually Positioning an Interface for Data Mining by Humanities Scholars

Stan Ruecker; Ximena Rossello; Greg Lord; Milena Radzikowska

Authorship

1. Stan Ruecker

Department of English and Film Studies - University of Alberta, Humanities Computing - University of Alberta
2. Ximena Rossello

Dept of Art and Design - University of Alberta
3. Greg Lord

Maryland Institute for Technology and Humanities (MITH) - University of Maryland, College Park
4. Milena Radzikowska

Centre for Communication Studies - Mount Royal University (Mount Royal College)

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We describe in this paper a strategy for interface design based on the concept of visual positioning. We apply this strategy to the design of an interface for the Nora project, which presents a unique opportunity to create tools to accommodate a powerful technology-data mining-to a new group of users-humanities scholars.
The goal of the Nora project is to apply state-of-the-art data mining processes to a wide range of problems in the humanities (Unsworth 2005), not only in the service of hypothesis testing, but also as a means of contributing to hypothesis formulation (Shneiderman 2001; Ramsay
2003). In both of these cases, however, the question
arises of how to make the power of data mining for text collections accessible to academics who are neither
mathematicians nor computer programmers. Typical
interfaces for data mining operations involve either
command lines, such as are used in working in UNIX, or else GUIs, the visual positioning of which frequently places them in a technical domain-many resembling the interfaces used in software compilers. For humanities scholars, it is necessary to consider alternative designs that attempt to adopt a visual position that is at once more
congenial and more appropriate for humanists, while
at the same time sacrificing as little as possible of the functional control of the underlying system.
The concept of visual positioning has become widespread
in the visual communication design community. An
early formulation of the principle was provided by Frascara
(1997) who pointed out that since one of the primary goals of the graphic designer is to improve communication, it is necessary to consider the visual environment and
visual preferences of the users in order to increase the success of the design in communicating with them. The application of this concept to interface design suggests that there are going to be designs that are more or less successful for a particular group of users, and that the same designs won’t necessarily be successful to the same degree with a different group that does not share the same visual position.
In connection with the Nora project, the necessary
communication is between the technical mechanism of the data mining processes and the potential user-the
humanities scholar. A typical data mining operation
consists of the following stages:
1) the system provides the user (in this case, a scholar) with a sample of documents from the collection
2) the scholar chooses among the sample documents those which are of interest for a particular study. In the two Nora project examples, a sample of poems
from a collection of Emily Dickinson was rated
in terms of erotic content, and a sample of novel chapters was rated according to their instantiation of the concept “sentimentalism.”
3) the system performs a set of “feature extraction”
actions in order to determine shared characteristics of the selected documents
4) the scholar examines the shared characteristics and iteratively adjusts the result as necessary
5) the system applies the resolved characteristics to the larger collection in order to automatically identify similar documents
6) the scholar studies both the shared characteristics and the result set, often by using a visualization tool (in Nora, the InfoVis toolkit).
We call the interface intended to facilitate this process
the clear browser. It is based on the idea of rich-
prospect browsing, where some meaningful representation
of every item in the collection is combined with a set of tools for manipulating the display (Ruecker 2003). In this case, the primary tools are in the form of a set of “kernels” which encapsulate in visual form the results of the data training stage. The kernels allow a simple
means of storing the results of feature extraction
processes for further modification or use, and also give the user a simple mechanism for applying the process, by
dragging and dropping the kernel within the representation of all the collection items (Figure 1). The effects of the kernel are to visually subset the collection items into two groups-selected and unselected-so that the user can subsequently access the items in the selected subset. The design also allows for combinations of kernels, and for a single kernel to provide multiple functions, including not only subsetting the items, but also adding further
grouping or sorting functions, as well as changes to the form of representation.
Figure 1. The Clear Browser provides a number of blank kernels that can be configured by the user through a data mining “training” process. These kernels can then be
applied to the larger collection by dragging and dropping them. This sketch shows a total collection of 5000 author names, with a subset selected by the kernel.
One of the important aspects of the visual positioning for humanities scholars is the proposed form of the meaningful representation of the individual items in the collection. These items are each a piece of text, and
together they form a large body of text that is displayed
on screen as the default interface. It perhaps goes
without saying that humanities scholars are comfortable
with text, whether in print or on screens, and the choice
to represent collection items with text can therefore
contribute to their ability to interpret quickly and intuitively
what is happening with a system that might otherwise be unfamiliar or disorienting.
For purposes of illustration, it might be helpful at this point to introduce a scenario involving changes to
the form of representation. Such a change might be
introduced by the system in connection with a
sorting action. For example, if the items in the collection
are initially represented as the titles of poems, and the user elects to sort the selected poems by date of first
publication, it would typically be useful at that point
to add the date to the name of each poem. This
addition would constitute a change to the individual
representations of items. Alternatively, in cases where the
user prefers to group the items rather than sort them, the additional information might be attached to the
entire group in the form of a group label, in which case the
representations of the individual items in the group would
remain unchanged.
Another aspect of the visual positioning is the animated
actions of the kernels, which interact with the field of representations with an effect like oil and water. The
animation of the movement of the text items, which move
to the periphery of the display or the centre of the
area associated with the kernel, provides two kinds of cognitive reassurance. First, the user has a sense of being
able to follow the action of the data mining process as
encapsulated in the kernel. Second, the animated transitions
of the text items provide reassurance that the system is rearranging the collection without adding or subtracting any items. This second factor is particularly important in cases where one of the other functions of the kernel is to add or subtract components from the meaningful
representation. By animating the movements and changes
in discrete steps, the interface helps make the results of the process understandable. The animated actions of the items become part of the visual positioning, not because cognitive reassurance isn’t important for all users, but because some users can benefit more than others from having it provided in this form.
References
Frascara, Jorge (1997). User-centered Graphic Design. London: Taylor and Francis.
Ramsay, Stephen (2003). “Toward an Algorithmic
Criticism.” Literary and Linguistic Computing. 18.2.
Ruecker, Stan (2003). Affordances of Prospect for
Academic Users of Interpretively-tagged Text
Collections. Unpublished Ph.D. Dissertation.
Edmonton: University of Alberta.
Shneiderman, Ben (2001). “Inventing Discovery Tools: Combining Information Visualization with Data
Mining.” Keynote for Discovery Science 2001
Conference, November 25-28, 2001, Washington, DC.
Unsworth, John (2004). “Forms of Attention: Digital Humanities Beyond Representation.” Paper delivered at CaSTA 2004: The Face of Text. 3rd conference
of the Canadian Symposium on Text Analysis,
McMaster University, Hamilton, Ontario. November
19-21 2004.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

The Clear Browser: Visually Positioning an Interface for Data Mining by Humanities Scholars

1. Stan Ruecker

2. Ximena Rossello

3. Greg Lord

4. Milena Radzikowska

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006