Modelling digital editing: of texts, documents and works

paper, specified "long paper"
  1. 1. Elena Pierazzo

    King's College London

  2. 2. Geoffroy Noël

    Department of Digital Humanities - King's College London

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Editing is one of the most important activities we have to perform as digital humanists: with the entire literary and historical production facing remediation, the need for a theoretical as well as practical understanding of what it is at stake and what does it mean to create a digital (scholarly) edition is of crucial importance. Many contributions in the past have dealt with the issue of what a text is, what a document is, how they relate to each other and which are the implications of their ontological status with respect to the work they manifest1 2 3456 78 910 1112 to name a few). The topic has been tackled from different points of view: sociological, cultural, psycholinguistic, philosophical, historical and computational. Why, then, is it necessary to return to the same topic once again? Because while some if not most of the previous contributions have touched upon it, none has tried to account for the whole concept of digital editing, and if a model is “a representation of something for purposes of study, or a design for realizing something new” (p. 21) 13, a new purpose of study will require a new model. In order to perform an activity with the help of a computer, in order to digitize a workflow, such an activity (i.e. editing) needs to be modelled, as there is “the fundamental dependence of any computing system on an explicit, delimited conception of the world or ‘model’ of it” (p. 210)14.

The only existing model of text that has been developed explicitly for computational purposes is represented by the so-called OHCO model15, the limitations of which are widely known 16 17 1819 20. However, as it reflects upon the fundamental way of functioning of the most important technology used so far for the production of digital editions, i.e. XML, in spite of its inadequacy, it still represents a fundamental approach for editorial endeavours: if one edits using XML and TEI, then one will have to adopt some sort of model which strongly relates to the OHCO model. But besides all the issues already pointed out by earlier critics, there is a series of facts and entities which are outside the scope of the OHCO model but which are nevertheless of fundamental importance in scholarly editing; namely, what is an edition? What is a work? What is the relationship between an edition and the text it edits, and the work the text represents? What is the function of the reader and of the editor in establishing the text and the edition?

The edition of a text, any text, embodies a model of the work which the text represents. In the same way that a map can be considered a model of the earth built for a specific purpose, an edition of a text can be considered as a model of the text itself, because it represents a selection of the infinite features of a text according to particular point of view, scholarly or not. Selecting also mean simplifying: a model is necessarily a simplification of a real life object, which makes it more apt to analysis and manipulation, In a contribution from 2009 Michael Sperberg-McQueen declared that there are three things to consider when we edit a text (p. 31)21:

There is an infinite set of facts related to the work being edited.
Any edition records a selection from the observable and the recoverable portions of this infinite set of facts.
Each edition provides some specific presentation of its selection.
In saying this he declared an edition to be a model of a work, where the act of selecting features from the uninterrupted continuity of the reality is the defining act of modelling, the purpose of which in turn is to provide a discrete selection of facts to be interpreted. The present research stems from this consideration and proposes a new, comprehensive conceptual model of the editorial domain, which could be called the Digital Editing Model, or DEM.

The conceptual model deals with the following entities:

Documents, i.e. physical objects that contain some sort of information; therefore a book is a document, as is a leaf with some writing on, a stone, and so on. More generally, a document is a physical object that has some text on it, or more formally, a Verbal Text Bearing Object, or VTBO. This definition willingly and knowingly omits non-verbal documents, as the object of the present research is to analyze and model written and verbal texts with the purpose of editing them.[1]
User-function: any type of human interaction with the documents. The entity represents set of functions, more than human beings, such as, for instance, reading, editing, collecting, preserving, transcribing, analyzing etc.
As seen before, documents present an infinite set of facts. A user-function selects a subset of these facts and, according to an organizing principle, groups them into dimensions. As the dimensions that are potentially observable in a document are defined by the user-function’s purpose, consequently it is impossible to draw a stable and complete list of such dimensions; however, for the purpose of exemplification, such a list could include linguistic, semantic, literary, genetic, iconographic, codicological, and palaeographical dimensions.

The interaction of user-functions with documents generates

Document models: the meaning(s) that user-functions give to the subset of dimensions they derive from a document and that they consider interesting. If the subset of dimensions considered by the user includes the verbal content of the document, such a document’s model is defined as a text.
Works: an editorial statement of the fact that a number of documents aim to contain more or less the same verbal content. The sum of all possible texts derived from such documents, in a one-to-one or many-to-one relation, constitute the work.
The model does not necessarily need an author-function, but it may, if the user-editor postulates it. If present, the author-function performs two main sub-functions, one in posse and one in esse, where the latter represents the activities of producing some of the facts present in the documents, especially, but not exclusively, the ones concerning the verbal content of the documents. The function in posse concerns instead the authorial intention, namely what the author-function wanted to produce but did not, or, if it did, the evidence for this is lost. It is in posse because it is unachieved (or perhaps is achieved but with no way of knowing this).

The DEM model sketched out here, will be supplemented by a second one on text transmission (how text migrates from one support to the next), where emerging theories of transcription will also fit very well 22 23242526. However, these latter theories can also be thought as an instance of the user-function, encompassing the building of texts from the infinite set of facts available from the documents, and therefore are integrated into the DEM. In its full version, DEM will also deal with concepts such as versions and derivative works.

In conclusion the proposed DEM contributes to the elaboration of a set of models required by the renewed work and workflow of digital editing, in dialog with previous scholarly elaborations, providing a holistic and agnostic base for the understanding of the digital editorial endeavour.

[1] A similar but more generalized definition is given by Huitfeldt and Sperberg-McQueen, who prefer to speak of ‘marks’ on a document, rather than of ‘verbal text’: ‘By a document we understand an individual object containing marks. A mark is a perceptible feature of a document (normally something visible, e.g. a line in ink)’ 27.

1. DeRose, S. J., D. G. Durand, E. Mylonas, and A. H. Renear (1990). What is Text, Really? Journal of Computing in Higher Education 2:1 3-26.

2. Shillingsburg, P. L. (1991). Text as matter, concept, and action. Studies in Bibliography, 44:31–83.

3. Shillingsburg, P. (2006). From Gutenberg to Google: Electronic Representations of Literary Texts. Cambridge: Cambridge University Press.

4. Caton, P. (2013). On the term text in digital humanities. Literary and Linguistic Computing, 28(2):209–220.

5. Gabler, H. W. (2012). Beyond author-centricity in scholarly editing. Journal of Early Modern Studies, 1:15–35.

6. Eggert, P. (2009). Securing the Past: Conservation in Art, Architecture and Literature. Cambridge: Cambridge University Press.

7. Tanselle, G. Thomas. A Rationale of Textual Criticism Philadelphia: University of Pennsylvania Press, 1989. 104 pp.

8. Sperberg-McQueen, C. M. (2009). How to teach your edition how to swim. Literary and Linguistic Computing, 24(1):27–52.

9. Robinson, P. M. (2009). What text really is not, and why editors have to learn to swim. Literary and Linguistic Computing, 24(1):41–52.

10. Robinson, P. M. W. (2013). Towards a theory of digital editions. Variants 10. 105-132.

11. Barthes, R. (1968). La Mort De l'Auteur. Manteia (4e trimestre).

12. Ong, J. Walter (1975). The Writer's Audience is Always a Fiction. PMLA, Vol. 90/1, pp. 9-21

13. McCarty, W. (2005). Humanities Computing. Palgrave Macmillan.

14. McCarty, W. (2004). Modeling: a Study in Words and Meanings. in Companion to the DigitalHumanities. Blackwell.

15. DeRose, S. J., D. G. Durand, E. Mylonas, and A. H. Renear (1990). What is Text, Really? Journal of Computing in Higher Education 2:1 3-26.

16. Huitfeldt, C. (1994). Multi-Dimensional Texts in a One-Dimensional Medium. Computers and the Humanities, 28(4-5). Humanities Computing in Norway. 235-241.

17. Pichler, A. (1995). Transcriptions, texts and interpretation. In Johannessen, K. and Nor- denstam, T., editors, Culture and Value: Philosophy and the Cultural Sciences, pages 690–695. Austrian Ludwig Wittgenstein Society, Wien.

18. Renear, A. H., Mylonas, E., and Durand, D. (1996). Refining our notion of what text really is: The problem of overlapping hierarchies. In Ide, N. and Hockey, S., editors, Research in Humanities Computing. Oxford University Press.

19. Pierazzo, E. and Stokes, P. A. (2010). Putting the text back into context: a codicological approach to manuscript transcription. In Fischer, F., Fritze, C., and Voelgler, G., editors, Kodikologie und Palographie im Digitalen Zeitalter 2 - Codicology and palaeography in the digital age 2, pages 397–430. Books on Demand, Norderstedt.

20. Deegan, M. and Sutherland, K. (2009). Transferred Illusions. Digital Technology and the Forms of Print. Ashgate, Farnham.

21. Sperberg-McQueen, C. M. (2009). How to teach your edition how to swim. Literary and Linguistic Computing, 24(1):27–52.

22. Huifeldt, C., and C. M. Sperberg-McQueen (2008). What is transcription? Literary and Linguistic Computing. 23 (3). 295-310. doi:10.1093/llc/fqn013

23. Huitfeldt, Claus, Yves Marcoux and C. M. Sperberg-McQueen (2010). Extension of the type/token distinction to document structure. In Balisage: The Markup Conference 2010. held August 3-6, 2010 in Montréal, Canada. In Proceedings of Balisage: The Markup C

24. Sperberg-McQueen, C. M.. Claus Huitfeldt, and Yves Marcoux (2009). What is transcription? Part 2. Talk given at Digital Humanities 2009, College Park, Maryland. Slides on the Web at

25. Caton, Paul (2013=. Pure transcriptional encoding. Paper given at Digital Humanities 2013, Lincoln, Nebraska.

26. Caton, P. (2013). On the term text in digital humanities. Literary and Linguistic Computing, 28(2):209–220.

27. Huitfeldt, Claus, and C. M. Sperberg-McQueen. (2008) What is transcription? Literary & Linguistic Computing 23.3: 295-310.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO