University of Pittsburgh at Greensburg
Maryland Institute for Technology and Humanities (MITH) - University of Maryland, College Park
Science fiction has been theorized as a laboratory in which text serves as the medium for experimentation with perspective and epistemology.
Deconstructing Starships: Science, Fiction and Reality (Liverpool UP, 1999) 4.
Yet scientific methods are more practicably applicable to the systematic efforts of textual scholars. Computationally assisted collation demands continual refinements to verify the accuracy of textual data and metadata and challenges a singular view of any documentary edition. Moreover, collation can test hypotheses about change over time, and the output of machine collation can serve as an experiment to identify, quantify, survey, and analyze the data of textual change. Digital collation of science fiction seems to combine the practical with the theoretical in its lab space.
An early form of modern science fiction, the 19th-century novel
Frankenstein has itself been the subject of digital variorum experiment since the mid-1990s production of the Pennsylvania Electronic Edition (PAEE) by Stuart Curran and Jack Lynch, a daring effort to prioritize the critical apparatus, pulling it from the obscurity of small type at the bottom of printed pages to make it front and center on screen displays.
See a representative page at
. Curran, Stuart and Jack Lynch.
Frankenstein; or, the Modern Prometheus. The Pennsylvania Electronic Edition. Est. 1994.
The PAEE challenges us to find new ways to tell the variorum narrative of change over time. Much like Victor Frankenstein's composition of the Creature from multiple bodies, the effort to aggregate the distinct versions of this novel into a variorum might succeed in communicating a multi-dimensional narrative of its own composition and decomposition, inviting the reader to evaluate its successive stages just as the reader is invited to evaluate the three storytellers within the novel.
In the history of preparing digital texts with markup languages, whether in early HTML, SGML, or XML, markup standards tensed between two poles: a) the acknowledgement of a coexistence of multiple hierarchical structures and b) the need to prioritize a single document hierarchy in the interests of machine-readability, while permitting signposts of overlapping or conflicting hierarchies as of secondary importance.
See for example, P. M. W. Robinson, "
The Collation and Textual Criticism of Icelandic Manuscripts"
Literary and Linguistic Computing
, Volume 4, Issue 2, 1 January 1989, 99–105,
; Dekker, Ronald Haentjens, Dirk van Hulle, Gregor Middell, Vincent Neyt, and Joris van Zundert, "Computer-supported collation of modern manuscripts: CollateX and the Beckett Digital Manuscript Project"
Volume 30, Issue 3, 1 September 2015, 452–470,
Eggert, Paul, "
The reader-oriented scholarly edition"
Digital Scholarship in the Humanities
, Volume 31, Issue 4, 1 December 2016, 797–810,
and Holmes, Martin, "
Whatever happened to interchange?"
Digital Scholarship in the Humanities
, Volume 32, Issue suppl_1, 1 April 2017, 163–168,
In this paper we present a view of texts that emerges from the experience of comparing documents encoded in conflicting ways. Like the makers of
, we find that multiple encoding structures must co-exist and correlate to achieve a meaningful comparison of editions.
See Gerrit Brüning, Katrin Henzel, and Dietmar Pravida, "Multiple Encoding in Genetic Editions: The Case of 'Faust'", Journal of the Text Encoding Initiative (4: March 2013)
Further, hierarchies need to be reconceived in dynamic terms—where are their flex points for conversion from containment structures to
loci of intersection? In the process of collation, hierarchies must be dismantled and flattened in order for meaningful multiplicity to be represented, and in order for us to understand a dialogic relationship among textual variants. To study variation over time vexes the organizing principle of any singular hierarchy, but hierarchy in the context of collation may nevertheless build a robust architecture that
bridges distinct encodings rather than isolating them. In this architecture, arches and connecting spans are more viable than monoliths.
An inspiration for the bridging concept are the visualizations in
Haentjens Dekker, Ronald, and David J. Birnbaum, “It's more than just overlap: Text As Graph,” Balisage: The Markup Conference 2017, Washington, DC, August 1 - 4, 2017; in
Proceedings of Balisage: The Markup Conference 2017
. The authors conceptualize an ideal model of texts in a graph structure organized primarily by their semantic sequencing and in which structural features are a matter of descriptive annotation rather than elemental hierarchy.
This paper addresses the serious issues of collating digital editions made at different times by different editors, and it discusses the bicentennial
Frankenstein variorum project as a challenging, illustrative case in point. We are preparing a variorum edition of Frankenstein in TEI P5 based on the 1818 and 1831
Frankenstein digital texts due to be released in 2018 in celebration of the bicentennial of
Frankenstein's first publication. Our collation source documents are adapted from the 1990s encoding of the PAEE (for the 1818 and 1831 editions), and the Shelley-Godwin Archive's diplomatic edition of the manuscript notebooks.
The Shelley-Godwin Archive's edition of the manuscript notebooks of
Frankenstein builds on decades of intensive scholarly research to create TEI diplomatic encoding:
We are also newly incorporating a little-known edition of 1823 produced from corrected OCR. Our collation should yield a meta-narrative of how
Frankenstein changed over time in four versions that passed through multiple editorial hands. It is widely understood that the 1831 edition diverges sharply from the first print edition of 1818, adding new material and changing the relationships of characters. Less known is how the notebook material compares with the print editions, and how much we can identify of the
persistence of various hands involved in composing, amending, and substantially revising the novel over the three editions. For example, to build on Charlie Robinson's identification of Percy Bysshe Shelley's hand in the notebooks,
See Charlie Robinson's Introduction to the Frankenstein Notebooks (Garland 1996), reproduced here:
our collation can reveal how much of Percy's insertions and deletions survive in the later print editions. Our work should permit us to survey when and how the major changes of the 1831 text (for example, to Victor Frankenstein's family members and the compression and reduction of a chapter in part I) occurred. We preserve information about hands, insertions, and deletions in the output collation, to serve as the basis for better quantifying, characterizing, and surveying the contexts of collaboration and revision in textual scholarship.
The three print editions and extant material from three manuscripts are compared in parallel, to indicate the presence of variants in the other texts and to be able to highlight them based on intensity of variance, to be displayed like the highlighted passages in each visible edition of
The Origin of Species in Darwin Online.
Barbara Bordalejo, ed.
Darwin Online. See for example the illumination of variant passages in
The Origin of Species here:
Rather than any edition serving as the lemma or grounds for collation comparison, we hold the collation information in stand-off markup, in its own XML hierarchy. That XML "bridge" expresses relationships among the distinct encodings of diplomatic manuscript markup in which the highest level of hierarchy is a unit leaf of the notebook, with the structural encoding of print editions organized in chapters, letters, and volumes. While the apparently nested structure of these divisions might seem the most meaningful way to model
Frankenstein, these pose a challenge to textual scholarship in their own right. As Wendell Piez has discussed,
Frankenstein's overlapping hierarchies of framing letters and chapters have led to inconsistencies in the novel's print production. Piez deploys a non-hierarchical encoding of the novel on which he constructs an SVG modeling (in ordered XML syntax) of the overlap itself.
These hierarchical issues provided an application for Piez’s invented LMNL "sawtooth" syntax to highlight overlap as semantically important to the novel; see Wendell Piez, "Hierarchies within range space: From LMNL to OHCO"
Balisage: The Markup Conference Proceedings (2014):
Our work with collation depends on a similar interdependency of structurally inconsistent encoding.
Our method involves three stages of structural transformation, each of which disrupts the hierarchies of its source documents:
Preparing texts for collation with CollateX
CollateX software applies a graph-based model of text to locate variants in documents. See
Collating a new "braided" structure in CollateX XML output, which positions each variant in its own reading witness.
Transforming the collation output to survey the extents and kinds of variation, and to build a digital variorum edition.
In the first stage, we adapt the original code from the Shelley-Godwin Archive and from the PA-EE to create new forms of XML to carry predictable markers to assist in alignment. These new, pre-collation editions are resequenced (as when we move marginal annotations from the end of the XML document into their marked places
as they would be read in the manuscript notebook). They are also differently "chunked" than their source texts, resizing the unit file so that each represents an equivalent portion small enough to collate meaningfully and large enough that each document demonstrably aligns with the others at its start and end points.
Stage two weaves these surrogate editions together and transfers information from tags that we want to preserve for the variorum. Interacting with the angle brackets as patterned strings with Python, we mask several elements from the diplomatic code of the ms notebooks so that they are not processed in terms of comparison but are nevertheless output to preserve their distinct information. In CollateX's informationally-rich XML output, these tags render as flattened text with character entities replacing angle brackets so as not to introduce overlap problems with its critical apparatus. In Stage three, we work delicately with strings that represent flattened composite of preserved tag information and representations of the text, using XSLT string-manipulation functions to construct new files for analysis. We can then study, for example, where the strings associated with Percy Shelley are repeated in the later editions, and how many were preserved by 1831. We also build a scaffolding in stand-off markup for the digital variorum that bridges multiple editions, as modelled in Figure 1.
Fig. 1 An example variant with two different readings, showing Percy Bysshe Shelley's hand in the ms notebook. While the print editions of 1818, 1823, and the manuscript agree (yellow reading), the print edition of 1831 introduces new text (blue reading). The pointers are expressed according to the TEI XPointer Schemes defined in Chapter 16 of the TEI Guidelines and are subject to change.
This example shows how the stand-off collation identifies variant readings between texts by grouping pointers as opposed to grouping strings of text according to the parallel segmentation technique described in Chapter 12 of the TEI Guidelines.
See especially the TEI P5 Guidelines, 12.2.3 and 12.2.4:
The TEI offers a stand-off method for encoding variants, called “double-end-point-attachment”, in which variants can be encoded separately from the base text by specifying the start and end point of the lemma of which they are a variant. This allows encoders to refer to overlapping areas on the base text, but despite its flexibility, this method still requires choosing a base text to which anchor variant readings. While choosing a lemma for each variant may be necessary for a critical edition, it is not ideal for a variorum edition that, by design, does not choose a base text.
For a related example, see Viglianti, R. Music and Words: reconciling libretto and score editions in the digital medium.
“Ei, dem alten Herrn zoll’ ich Achtung gern’”, ed. Kristina Richts and Peter Stadler, 2016, 727-755.
Our approach, therefore, simply identifies variance and groups readings from multiple sources without conflating them into one document and with accommodation of multiple hierarchies.
Though we think of XML as a stable sustainable archiving medium, the repeated collapsing and expansion of hierarchies in our collation process makes us consider that for the viability of digital textual scholarship, ordered hierarchies of content objects might best be designed with leveling in mind, and that building with XML may be optimized when it is open to transformation. Preparing diversely encoded documents for collation challenges us to consider inconsistent and overlapping hierarchies as a tractable matter for computational alignment
—where alignment becomes an organizing principle that fractures hierarchies, chunking if not atomizing them at the level of the smallest meaningfully sharable semantic features.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at El Colegio de México, Universidad Nacional Autónoma de México (UNAM) (National Autonomous University of Mexico)
Mexico City, Mexico
June 26, 2018 - June 29, 2018
340 works by 859 authors indexed
Conference website: https://dh2018.adho.org/