. . . but what kind of electronic editions should we be making?

  1. 1. Peter Robinson

    International Institute for Electronic Library Research - De Montfort University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Over the past few years, the advent of the new technology has led to considerable scholarly discussion about the possibilities of electronic editions of literary and other humanities texts, and a few actual electronic editions. This paper will review some of the editions that have been made, and suggest a positive prescription as to exactly what kind of electronic editions we should be making.

The possibilities offered by new technology for large-scale critical editions are implicit in the very first statement of the revolutionary possibilities of computing, Vannevar Bush's account of the Memex machine. Later writers have elaborated this: editions may now present many texts, not just one; all versions, not just some; every relevant commentary or criticism, not just a selection.

For all this anticipation, electronic editions have been slow in appearing. There are many signs that this situation is changing radically, and that we are approaching a watershed after which electronic editions of major humanities texts, edited by the leading scholars in their field, will no longer be a dream but will be the norm. Firstly, such editions are actually now appearing. There has been the pioneering work of Chadwyck-Healey which, for all its faults, has made available many printed texts which were otherwise virtually inaccessible. In the first quarter of 1996 two rather different CD-ROM editions of major English works will appear: Anne McDermott's edition of the first and fourth editions of Johnson's "Dictionary of the English Language" and my own edition of Geoffrey Chaucer's "Wife of Bath's Prologue", both to be published by Cambridge University Press. Both these editions are based upon word-by-word scrutiny of original material, and are therefore nearer the paradigm of a traditional edition.

Second, perhaps most significantly: I know of at least twelve other major editorial projects similar to the Chaucer and Johnson editions, predicated on electronic publication and either in advanced planning or in active preparation (for example: Jerome Mcgann's Rosetti archive; Richard Finneran's Yeats project; the American Model Editions partnership led by David Chesnutt). All these projects share a fundamental defining characteristic: their proponents are in every case among the leading scholars in their field, usually with a lifetime of traditional scholarship and several major traditional print editions behind them. This collective decision by so many leading scholars, to devote so much effort from now on to electronic editions rather than to print editions, is arguably the most important shift in the history of textual scholarship this century.

The momentum toward electronic editions and away from print critical editions appears irresistible. Further, the great potential of computers for scholarly textual work has woken interest from people who would otherwise never think of manuscripts and editions. In the next few years, we have a chance to convert that interest into new scholarly work, work that could not otherwise be done. In the name of computers, we can make editions of major works of a scope and precision previously unimaginable; in the name of computers, we can get neglected texts transcribed, studied and published. Indeed, this is already happening: McDermott's Johnson's Dictionary and the Bergen Wittgenstein project are examples of the former; the work of the Women Writers Project and of Project Electra are examples of the latter.

We have an historic opportunity, then, in the next decades to do much scholarly work that simply would not otherwise be done. The question posed by the title of this paper: what kind of electronic editions should we be making? is therefore particularly compelling. Broadly, one may divide the electronic editions so far made, or in the process of being made, into two groups: the 'more is better' group and the 'less is better' group. The 'more is better' group is familiar from the Chadwyck-Healey model: this takes advantage of the vast storage of the electronic medium to find all the relevant texts available and heap them on the CD-ROM. The advent of multimedia has added a new element to the 'more is better' approach: as well as all the relevant books, one may include all the relevant sound and image material: thus McGann's Rosetti and Finneran's Yeats projects. The advent of World Wide Web suggests that such editions need never be fixed, and do not need to restrict themselves to any one readership group: they can aim to be all things to all people.

This is a most seductive vision. Unsurprisingly, most of the speculative comment about electronic editions conceives editions of this kind. But there are formidable practical and intellectual difficulties with this model. Copyright legislation alone is likely to prevent inclusion of all that might be relevant. Above all: as editors our resources are limited. Do we put our effort into gathering more, and more material? There is a real danger here of loss of coherence; of confusion for the reader and loss of quality as quantity is pursued for its own sake. In its short life, the World Wide Web has provided a home for all too many examples of this already. Support for the 'more is better' approach is likely to issue in numerous projects whose ambition is far beyond their resources, and hence to many half-begun projects issuing incomplete and inaccurate materials often indiscriminately chosen and indifferently presented.

The 'less is better' approach, on the other hand, takes a quite different approach. It is this approach which underlies McDermott's Johnson's Dictionary, the Bergen Wittgenstein project, and my own Canterbury Tales Project. Here, the editor's aim is to identify a particular textual domain and a particular audience, and to present that text for that audience as clearly, as richly, and as accurately, as is possible with the resources available. Such projects are rigorously focussed: McDermott presents only the first and fourth editions of Johnson's Dictionary; the Canterbury Tales project presents only the manuscripts and printed editions of the Tales produced before 1500. They are also rigorously exclusive: there is no discussion of the importance of Johnson's lexicographic work on the CD-ROM, and the Wife of Bath's Prologue CD-ROM contains no glossary and no study of the Wife of Bath herself.

It has been objected of such editions that they do not 'take advantage' of the digital medium; that because they are deliberately limited in the range of included matter, then they have failed to 'do what books cannot do' and do not extend scholarship as an electronic book ought to do. I believe this objection is mistaken. It is narrow to assert that the merit of an electronic edition lies merely in inclusion of non-text material. The aim should not be just to include as much widely-varied material as we can: it should be to use the power of the computer to search, to sort, to navigate, to explore, what we do present.

Examination of the Johnson and Chaucer CD-ROMs shows the immense potential of this approach. These CD-ROMs do far more than just present transcriptions of the text and images of the witnesses, bound up with some search tools. In the Johnson: every one of the 20,000 entries each of the first and fourth editions is matched to the corresponding entry in the other edition, so that you can move from one to another with a click of the mouse. Further, the transcripts are integrated with the images, so that at the head of every entry in the transcript you can open up the image of that page. Also, every time Johnson refers to another entry in the Dictionary this reference is realized as a hypertext link to that point of the Dictionary. This process of linking is taken far further in the Chaucer CD-ROM. Here, every word in every one of the 58 transcripts of the original witnesses is linked to the corresponding word in the other transcripts, so that with a few mouse-clicks you can see at any word the readings of all the other witnesses to that word. Further, every one of the three hundred thousand words in the 58 witnesses is sorted by lemma and grammatical function into a series of databases, permitting the reader to find every spelling of any one word in any one witness or in all the witnesses. All this is done with SGML encoded hypertext links: overall, some one million hypertext links on this one CD-ROM.

The 'less is better' approach is actually capable of extending our concept of what a text is, and how it might be read and studied, further than the 'more is better' approach might. The Chaucer CD-ROM provides the beginnings of an answer to one of the most difficult problems in scholarship: how does one read a text in 58 different versions? However, these CD-ROMs may do more than this. By presenting original-spelling transcripts and images of the primary documents in as attractive and convenient a form as possible, we hope to open a whole new audience. For the Chaucer, we hope it will not only be the dedicated few who use our edition: we believe that anyone interested in Chaucer, from high school up, should be interested in the text and the language of the manuscripts. Our work aims to make it possible for all these, as it never has been before, to look at the manuscripts, to compare their readings, to see the flux of spellings and readings: to see 'what kind of text a manuscript is.' The history of much modern literary thought is of a flight from the text to abstraction. Our aim is to reverse this: we are trying to use the electronic edition to bring people back to the text in its most original forms.

Accordingly, I offer this model for the electronic editions to be made in the next few years. The centre of these editions should be machine-readable transcriptions, usually in original-spelling form or a variant of it, of all significant witnesses, encoded in SGML according to the TEI norms. The editions should also include digital images of all transcribed material, linked to the transcriptions. This alone would constitute a significant scholarly resource. If we can add to this attractive presentation, and tools for navigation and exploration, then our work might be indeed well-done. We have the chance, as textual editors, to do work of real and lasting value in this medium. If we waste this chance and use it just to recycle into electronic form outdated existing editions, translations and commentaries then we will not deserve many thanks from the next generation.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at University of Bergen

Bergen, Norway

June 25, 1996 - June 29, 1996

147 works by 190 authors indexed

Scott Weingart has print abstract book that needs to be scanned; certain abstracts also available on dh-abstracts github page. (https://github.com/ADHO/dh-abstracts/tree/master/data)

Conference website: https://web.archive.org/web/19990224202037/www.hd.uib.no/allc-ach96.html

Series: ACH/ICCH (16), ALLC/EADH (23), ACH/ALLC (8)

Organizers: ACH, ALLC