THE NORA PROJECT: TEXT MINING AND LITERARY INTERPRETATION

Matthew Kirschenbaum

Authorship

1. Matthew Kirschenbaum

No affiliation given

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This panel brings together three papers showcasing
different facets of the nora Project, a multi-
institutional, multi-disciplinary Mellon-funded initiative
to apply text mining and visualization techniques to
digital humanities text collections.
We are currently one year into the initial two-year
phase of the project. Though most of our methods remain
tentative, most our findings speculative, and our
technical environment experimental, we nonetheless have significant progress to report. In practical terms, work on the project has advanced considerably since the initial demos and research agendas that were presented at last year’s conference (2005). We have conducted four sustained text mining investigations (two of which are discussed in detail in the papers below), built a complete technical environment that allows a non-specialist user
to engage in the text mining process, and we have
begun to achieve some consistency in our understanding
of what data mining in the humanities, particularly
literary interpretation, might be good for. While our
findings in this last area remain contingent in the extreme,
they nonetheless tend to cluster around activities such as provocation, patterning, anomaly, and re-vision (in
the most literal sense). In both of the literary test cases documented in the papers in this session, text mining has produced compelling insights that already provide the
basis for more traditional scholarly interventions—
papers and articles—in their respective subject fields. The technical environments featured in the papers likewise
have promise in their own right and stand ready to
support text analysis (Tamarind), structured text
visualization (Maryland’s adaptation of the InfoVis
Toolkit), and a newly designed visual environment in support of the kind of complex, aggregative operations endemic to data mining (the Clear Browser).
In “Undiscovered Public Knowledge,” Kirschenbaum
et al. report on their experiments mining for patterns
of erotic language in the poetry and correspondence
of Emily Dickinson. This paper also describes
significant components of the complete nora architecture,
including the end-user visualization toolkit. In
“Distinguished Speakers,” Ramsay and Steger explore
keyword extraction methods as a way of prompting
critical insight. Using the particular case of Virginia Woolf’s novel The Waves, they explore the use of the tf-
idf formula and its variations for finding the “distinctive vocabulary” of individual characters in a novel. They also discuss their use of Tamarind (an XML preprocessor for scholarly text analysis used by the nora project) to make such
investigations faster and easier. In “The Clear Browser,”
Ruecker, Rossello and Lord describe their attempt
to create a visual interface design that is effectively
positioned to be attractive for humanists. The goal of this sub-project is to help make the system accessible and
interesting for scholars who might have an interest in the results of data mining, but are not immersed in the
technology.
All authors listed in the papers have communicated their willingness to participate.
References
S. Downie, J. Unsworth, B. Yu, D. Tcheng, G. Rockwell and S. Ramsay (2005). “A revolutionary approach to humanities computing?: Tools development and the D2K data-mining framework.” ACH/ALLC 2005.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

THE NORA PROJECT: TEXT MINING AND LITERARY INTERPRETATION

1. Matthew Kirschenbaum

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006