Modeling Melville's Reading: Editing Marginalia in TEI, Topic Modeling Reading and Influence

poster / demo / art installation
  1. 1. Christopher Ohge

    University of California Berkeley

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

I propose to contribute a poster at the Digital Humanities 2014 Convention on my recent initiative with Melville’s Marginalia Online ( In addition to creating an interactive, reader-friendly digital bibliography of the books Herman Melville was known to have owned, borrowed, and consulted, the editors of the project have been editing digital editions of the surviving books that contain Melville’s marginalia. The next phase for the project is to mark up the marginalia files in TEI P5, as well as thinking of new ways to represent the individuality as well as the totality of Melville’s reading practices. I have taken responsibility for this phase, first by collecting the OCR’d files of Melville’s editions, then by finding a way to mark up the marginalia. One challenge of the project is that it uses the “coordinate capture” tool (created by Matt Cohen at the University of Texas at Austin and the Walt Whitman Archive), which creates XML coordinates corresponding to the image file of each and every aspect of the book (from spine to covers to individual pages). However, these coordinate-captured XML files cannot be manipulated, which complicates the task of marking up the marginalia up in TEI. Furthermore, currently there exists no standard in the TEI Guidelines for marking up marginalia (especially as those kinds of “notes” correspond to specific places in the text while not being written in a linear fashion in most cases). Yet another question remains about Melville’s reading: how can we better understand not only his reading practices, but also quantitatively understand how his reading affected his published work? Countless studies have elucidated Melville’s sources (both with solid research methods and conjecture), but I propose to include a topic model (using Mallet) on Melville’s Marginalia Online that will “read” Melville’s reading in ways that will change the way we think about how the works he read influenced him. No longer must we guess how his reading influenced him; topic modeling will let us read the library of his entire life, and apply that information to his published writings. Gathering data through topic modeling allows Melville scholars a new way of studying literary influence. In the poster session, I look forward to reporting on the TEI markup of the marginalia files in order to show other attendees working on authors’ libraries how best to accomplish this task, as well as to demonstrate a custom algorithm for topic modeling large literary corpora relating to authorial influence.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO