This panel brings together three papers showcasing
different facets of the nora Project, a multi-
institutional, multi-disciplinary Mellon-funded initiative
to apply text mining and visualization techniques to
digital humanities text collections.
We are currently one year into the initial two-year
phase of the project. Though most of our methods remain
tentative, most our findings speculative, and our
technical environment experimental, we nonetheless have significant progress to report. In practical terms, work on the project has advanced considerably since the initial demos and research agendas that were presented at last year’s conference (2005). We have conducted four sustained text mining investigations (two of which are discussed in detail in the papers below), built a complete technical environment that allows a non-specialist user
to engage in the text mining process, and we have
begun to achieve some consistency in our understanding
of what data mining in the humanities, particularly
literary interpretation, might be good for. While our
findings in this last area remain contingent in the extreme,
they nonetheless tend to cluster around activities such as provocation, patterning, anomaly, and re-vision (in
the most literal sense). In both of the literary test cases documented in the papers in this session, text mining has produced compelling insights that already provide the
basis for more traditional scholarly interventions—
papers and articles—in their respective subject fields. The technical environments featured in the papers likewise
have promise in their own right and stand ready to
support text analysis (Tamarind), structured text
visualization (Maryland’s adaptation of the InfoVis
Toolkit), and a newly designed visual environment in support of the kind of complex, aggregative operations endemic to data mining (the Clear Browser).
In “Undiscovered Public Knowledge,” Kirschenbaum
et al. report on their experiments mining for patterns
of erotic language in the poetry and correspondence
of Emily Dickinson. This paper also describes
significant components of the complete nora architecture,
including the end-user visualization toolkit. In
“Distinguished Speakers,” Ramsay and Steger explore
keyword extraction methods as a way of prompting
critical insight. Using the particular case of Virginia Woolf’s novel The Waves, they explore the use of the tf-
idf formula and its variations for finding the “distinctive vocabulary” of individual characters in a novel. They also discuss their use of Tamarind (an XML preprocessor for scholarly text analysis used by the nora project) to make such
investigations faster and easier. In “The Clear Browser,”
Ruecker, Rossello and Lord describe their attempt
to create a visual interface design that is effectively
positioned to be attractive for humanists. The goal of this sub-project is to help make the system accessible and
interesting for scholars who might have an interest in the results of data mining, but are not immersed in the
S. Downie, J. Unsworth, B. Yu, D. Tcheng, G. Rockwell and S. Ramsay (2005). “A revolutionary approach to humanities computing?: Tools development and the D2K data-mining framework.” ACH/ALLC 2005.

