[Schools] can only teach those patterns which have proved
successful. If one is going to do new business the patterns
cannot help, though one does not deliberately go out to do
that. My Ántonia, for instance, is just the other side of the rug,
the pattern that is supposed not to count in a story.
(Willa Cather, 1925)
Tools for text analysis have long been a focus of many digital
humanities scholars, yet the results produced by those tools are
rarely utilized in typical literary and cultural criticism. Though
the reasons for this disconnect are varied, we believe two
primary hurdles are visibility and accessibility. Specifically,
most text analysis tools and research are not found on the
sites—like thematic digital text archives—that scholars use
most, and most text analysis tools are not designed for ordinary
scholars, but are meant for those with more technical
Our essay discusses a novel approach to filling this gap between
scholars and text analysis, an ongoing collaborative experiment
in humanities computing to bring a "new business" that willfully
counters the "patterns which have proved successful" in literary
criticism in order to add a new dimension to literary research.
In "The Other Side of the Rug: TokenX on the Will Cather
Archive," we discuss the application of Brian Pytlik Zillig's text
analysis, visualization, and play tool, TokenX (<http://to>) to the Willa Cather Archive (<http:/
/>), a free, educational resource dedicated
to the study of Willa Cather's life and writings and edited by
Andrew Jewell. When this application of TokenX debuts in the
summer of 2007, scholars will be able to analyze the entire
corpus of Cather's fiction, from her first college publications
to her final short story. In many ways, our project is in the
tradition of research that seeks to use text analysis tools to arrive
at, in Ramsay and Steger's language, "suggestive patterns" to
"enable critical reflection in literary study." However, in
creating this tool for application on the Cather Archive, we
were faced with two distinct challenges: (1) how to develop TokenX to make it capable of the envisioned crossdocument
analysis which is sensitive to changes over time, and (2) how
to design an interface that would make sophisticated text
analysis a manageable, useful tool for the widest possible range
of Cather scholars, scholars unaccustomed to using such tools.
The project is a interdisciplinary collaboration between Pytlik
Zillig, a Digital Initiatives Librarian with specialization in XSLT
and text analysis, and Jewell, Assistant Professor of Digital
Projects with a Ph.D. in American literature, both of the
University of Nebraska-Lincoln's Center for Digital Research
in the Humanities. The paper, likewise, is a collaborative work
that represents two distinct but complementary perspectives on
the issue.
Pytlik Zillig asserts that, for digital humanities, the development
and ready availability of tools to assist in the noticing and
appreciation of texts is of increasing importance. Unsworth has
observed that “by paying attention to an object of interest, we
can explore it, find new dimensions within it, notice things
about it that have never been noticed before, and increase its
value." For the first iteration of the TokenX/Cather
collaboration, the TokenX tool has generated a word frequency
data set containing nearly half a million data cells. These data
reveal the frequency of usage of words in fifteen TEI-encoded
XML texts, representing Cather’s complete corpus of
book-length fiction. This data set will be available for dynamic,
user-centered queries to assist in formulating theories and
facilitating explorations of Cather’s changing diction over time.
(See Figure 1 for initial results on ten sample terms within
Cather's corpus, detailing her usage of certain "body" words
within each of her published books of fiction.) If a text’s
common words are, as John Burrows suggests, "a barely visible
web that gives shape to whatever is being said" (323), then it
must be the ambition of tools such as TokenX to expose the
dimensions of that web for further inquiry. (Plans are underway
to add TokenX to the Text Analysis Portal for Research
[TAPoR], which will enable TokenX users to visualize, analyze
and play with documents stored on the TAPoR portal.)
Jewell argues that introduction of and experimentation with
new tools for engagement with literary texts are an important
way to make author-centric sites, like the Willa Cather Archive,
models of innovation. The audience of the Cather Archive is
not one inherently inclined to think about literary texts
numerically. The onus on the designer, then, is to consider the
sorts of research questions driving Cather and American
literature scholarship and to make this tool something that would
contribute to tackling such questions. For example, how might
text analysis contribute to a scholar exploring Cather's work
with a cultural studies or gender studies approach? What about
a scholar looking at contexts, themes, or references within a
single novel, or the fiction of a finite span in Cather's career?
How might this tool aid a researcher interested in tracking
evolutions in Cather's prose over a long period of time? (See
Lindemann and the program of the 2005 International Cather
Seminar for examples of these approaches). By designing an
interface that is sensitive to a range of scholarly inquiry, one
that allows for a significant amount of flexibility and user input,
TokenX on the Cather Archive can represent an innovative use
of digital research that is brought into the mainstream of
scholarship. By demonstrating sensitivity to researcher interests
and seeking to design broadly useful tools, we demonstrate to
colleagues that such tools are not just for specialists, but can
enrich diverse arguments, any that find foundation— as most
literary scholarship does—on the use of words.
