Collaborative Scholarship: Rethinking Text Editing on the Digital Platform

  1. 1. Massimo Riva

    Italian Studies - Brown University

  2. 2. Vika Zafrin

    Italian Studies - Brown University

Work text
Based at Brown University, the Virtual Humanities
Lab [1] is one of twenty-three “models of
excellence” in humanities education, supported by the National Endowment for the Humanities for 2004-06. [2] The project is being developed by the Department of Italian Studies in collaboration with Brown’s Scholarly Technology Group and with scholars in the U.S. and in Europe.
This paper will report on the achievements of VHL’s work during the first two years of its existence as a
platform for collaborative humanities research. We will discuss the editing process as we envision it: as a form of interdisciplinary and collaborative knowledge work. We will present issues arising from our experiment with subjective (or “idiosyncratic”) text encoding; challenges
we face in organizing the work of an international group of collaborators and the procedures for that work;
and the process of annotating and indexing large texts collaboratively. Finally, we will hint at VHL’s potential applications for pedagogical purposes.
We have made three Early Modern Italian
texts available online: Giovanni Boccaccio’s Esposizioni sopra la Comedia di Dante; portions of
Giovanni Villani’s Cronica Fiorentina; and Conclusiones
Nongentae Disputandae by Giovanni Pico della Mirandola. These three texts were selected as representative of different textual typologies (commentary, chronicle and treatise) that solicit different encoding and annotating strategies. The first two are large (around 700 and 200 modern print pages respectively) and heavily semantically encoded. The third is organized as a textual database meant to
provide a flexible platform for annotation. All texts share the technical infrastructure of the VHL.
The encoding was performed along interdisciplinary
lines by scholars of Italian literature and history.
The Cronica was encoded by two collaborators; the
Esposizioni had one principal encoder and several
researchers investigating specific issues. All three encoders
were asked to annotate without a DTD, using whatever elements they deemed appropriate based on two criteria:
– that the categories elucidated by the encoding are broad enough to produce interesting search results; and
– that, in their estimation, researchers interested in these texts would generally find their encoded
aspects interesting as well.
All three encoders received training; further guidance was available upon request. None of the three scholars had had previous semantic encoding experience.
Although the encoding proceeded separately for each text, similarities in what seems most interesting have emerged. Both texts contain encoding of proper names (including people, places, literary works mentioned) and the themes most prevalent in the narratives. These
similarities, and the exigencies of the Philologic [3] search engine being built, have prompted us to homogenize encoding across texts, and make it TEI-compliant to the extent possible. It is important to note that this step was taken after the encoding was completed. This afforded
our encoders freedom of analytical thought without
burdening them with an unfamiliar and very complex set of encoding guidelines.
The encoding process itself presented a challenge on several fronts. As often happens, it took the encoders a while to get used to doing work almost entirely at the computer. Because of the collaborative nature of the
project, and because it is good practice in general, we used a versioning system, and the encoders had to deal with the necessity of having an internet connection at least at the beginning and end of each working interval. The two editor-encoders of the Villani text faced particular
challenges, working as they were at different institutions, neither of which is Brown. So in addition to juggling the unintuitive (to them) practice of encoding with their other commitments, they faced the need to coordinate their schedules and responsibilities within the editing process. Combined with sometimes unreliable internet access, these circumstances channeled most communication into email and our work weblog.
Blogging, particularly posting incomplete reflections on a work in progress, was initially an obstacle. However, this work has brought two remarkable benefits. First, we have received feedback from people not directly involved in the project. Second, each participant was constantly updated on others’ progress. This gave all involved an idea of where VHL as a whole was going, and encouraged discussion at the grant-project level.
Built by the Scholarly Technology Group, the
annotation engine allows scholars with sufficient
access privileges to annotate texts. [Figure 1]
Annotations can be visualized, anchored to one or more passages of one or more texts. [Figure 2] A contributor’s
own annotations can always be modified or deleted
by that contributor. At the moment, a feature of the
annotation interface is in development that will allow (again, registered) scholars to reply to annotations made by others. We hope that this will foster collaborative
thinking and maintain an informal, workshop-like
environment for research.
An international group of around thirtyforty-five scholars has agreed to begin annotation of the Pico text. Having completed first-pass encoding of the other two works, we are assembling similar groups for them as well. Invited
annotators serve as alpha testers of the search and
annotation engines. Depending on the results, we plan to open up the process to the scholarly web community at large. One issue we face is whether to leave annotation open-ended (according to individual scholars’ interests and will) or to provide stricter guidelines – a working plan to be followed by all annotators. For now we have opted for an open process: participants will be free to
annotate the portions of the text that they prefer. The VHL discussion forum provides a venue were issues arising from the annotating process may be critically addressed.
We have generated indexes of the encoded texts. Merely compiling them took weeks – automatically generated lists revealed encoding mistakes
to be corrected, and highlighted many entries to be
researched further. We are not yet confident in the indexes’
accuracy, but have neither time nor resources to properly address the issue by ourselves. Here, again, the feedback of the scholarly community will be essential.
We see this as an opportunity to test out the already
mentioned discussion forum that completes the VHL toolkit, and to gather potentially interested users for
alpha-testing and feedback on the collaboration process
itself. A call for participation was disseminated on
relevant mailing lists, and mailed directly to relevant academic departments at many North American and
European universities. We aim to gather a group of
qualified [post]graduate and undergraduate students to help us verify the sizable indexes. At the time of this writing (March 2006), several young scholars have
expressed interest in contributing.
As a final step during the present grant period, we are organizing the existing toolset (search and
annotation engines, indexes, weblog and discussion forum)
under an umbrella category of the Virtual Seminar Room, which will serve as the venue where editing practices will be consistently linked to pedagogical activities. This move is prompted in part by the success of the Decameron
Web’s Pedagogy section, which continues to receive
positive feedback from teachers of Boccaccio all over the world.
It is too early to state definitively how the VHL will be used by humanists. Based on our prior experience with the Decameron Web [4] and the Pico Project [5],
however, we are cautiously optimistic. It is true that work performed entirely online, and collaboration as a mode of research, have been slow to catch on in the humanities.
Recent publications and tool developments point to a
desire on the part of humanist academics to have spaces
akin to science labs, where they can mingle and talk
informally about their research. [6] Such labs are difficult
and impractical to set up in physical space. So we have created a place online where scholars may interactively
edit and annotate texts, and develop pedagogical modules for their individual purpose. With user feedback, we hope to make the VHL attractive enough to humanities scholars that they’ll be convinced to come play with us, even if the modes of interaction may be unusual or confusing at first.
The past two years have resulted in a long wishlist of features to implement in the future, given time and
– addition of automatic lemmatizers and other
pedagogical parsing and mapping tools, aimed at the various textual typologies of VHL content;
– hosting and inclusion of texts uploaded by users;
– possibility of using the editing process as part of a seminar-like pedagogical experience;
– possibility of adding images as a consistent part of the editing/illustrating process;
– tools for transcription of manuscripts and incunabula; and others.
The future of the humanities is shaping up to be both
online and collaborative. The question is not whether
humanists will work together, but where they will do so, and what forms their knowledge work will take in the
public research arena provided by the web. The VHL is one practical step towards answering this question.
[2] National Endowment for the Humanities. (2004) “NEH Grants Support Models of Excellence in Humanities Education: $3.8 million awarded to 23
projects to create new humanities resources and
develop new courses.”
[6] Gina Hiatt’s 2005 article in Inside Higher Ed ( is a good example of discussion on the topic. “What I
am advocating,” she writes, “is injecting into the
humanities department some of the freewheeling dialogue found in the halls outside the conference presentation or in some of the better scholarly
blogs.” Tool makers have heeded the siren song of collaboration as well; resources such as TAPoR ( and the Virtual Lightbox (
attest to this.

