Six terms fundamental to modelling transcription

paper, specified "long paper"
  1. 1. Paul Caton

    King's College London

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Work begun by Huitfeldt and Sperberg-McQueen (2008) and continued jointly with Marcoux (2009, 2010) has given us a powerful and intuitive model of the abstract object T that counts as a successful transcription of an exemplar E.[1] (For convenience I will hereafter refer to these authors collectively as 'HSM' and their work as 'the HSM model'.)[2] Work on the HSM model is ongoing and a comprehensive formal account of the activity of transcription remains some way off, but extrapolating from what HSM have done so far we can begin to determine what is proper to a model of transcription per se, what is complementary to it, and what intra- and inter-model dependencies exist between entities. We can project, as it were, from the existing HSM model a bigger picture; but we stay always within the terms of the HSM model because it is the terms themselves we use for our projection. We thereby also clarify the terms themselves as they are used in the context of the model.

Here I focus on six terms fundamental to the 'bigger picture' and by elucidating these terms in relation to the model I sketch out the scope and composition of that picture: a necessary preliminary to any more detailed modelling to follow. The terms are: SURFACE, MARK, READING, TOKEN-SEQUENCE, EXEMPLAR, and DOCUMENT.

Consider the following scenario. A cave explorer discovers a new chamber and finds inside it three rock faces. One has complex patterns of scratches on it, each of the other two has painted lines on it. In all three cases what the explorer sees looks like writing, but she recognizes none of it. She takes a photograph of the rock face with the scratches and of one face with painted lines, then her camera fails. She pulls out a sketch pad and makes a faithful drawing of the other rock face with paint on it. Later she takes the photographs and drawing to an archeologist, who recognizes that the painted lines are indeed writing, in a language he knows, but that the forms of the characters are older and in many cases different from those in the current orthography of that language. He then makes written copies of the characters he sees in the photograph and drawing, this time using the current character forms to make them easier for other scholars to read. Doubtful about the scratched lines he informs a naturalist friend who visits the cave and confirms that all the scratches have been made by animals sharpening their claws on the rock face.

Now we retrace the conceptual movement of that narrative, introducing the fundamental terms in the appropriate places and using the context to clarify their relation to the HSM model and define them within that scope.

Each rock face is a SURFACE. A SURFACE is a thing: it is perceptible and measurable, and in its normal manner of existence can be returned to. The normal manner of existence of an electronic display, for example, is for a machine that generates it to be switched on and working properly: and while this is the case, we can return to the display.[3] A SURFACE is necessary for transcription; a SURFACE itself depends on nothing within the scope of this discussion. Of all things I describe here, a SURFACE is the closest to being a primitive entity.

Each scratch and painted line on the rock faces is a MARK. A MARK is a thing made upon a SURFACE by some agent. It is perceptible and measurable by contrast to the SURFACE, and therefore dependent upon the prior existence of the SURFACE.

The most complex of all the model-related entities is a READING. I will say it is normative - though not necessary - that READING is motivated. We are a communicative species and we actively look for instances of communication, willing to give the benefit of the doubt in many cases. MARKS are necessary for orthography, therefore the presence of MARKS implies the possible presence of writing. Consequently the presence of MARKS also implies the possible presence of text (here and throughout intended in the sense described in Caton 2013a), and text is written communication. Hence, in the normative case, an understanding of the possible presence of text motivates READING by an agent. In terms specific to the HSM model, READING is the process by which an agent attempts to discover and establish at least one TYPE-SEQUENCE in MARKS on a SURFACE by recognising certain MARKS to be certain TOKENS.

Because (in the normative case) READING is motivated, we must also grant that it may be entirely speculative. That is, it is acceptable that READING commences solely on the basis that MARKS are present on a SURFACE: there does not have to be certainty that at least one of those MARKS has token status.

Recall that our cave explorer could not assign token status to any of the marks she saw. Aware of her ignorance, she took steps (taking photographs and making a drawing) that have an interesting status in the overall picture because they seem to perform transcription without READING and therefore to deny that READING is necessary to transcription. But this is illusory. The goal of READING is to establish a TOKEN-SEQUENCE/TYPE-SEQUENCE, and the act of READING attempts this, assigning token status to MARKS where possible. There is no criterion of success for the activity: simply performing it is enough.[4] There doesn't have to be a specific TOKEN-SEQUENCE/TYPE-SEQUENCE at the end of it.

Instead of 'success', we distinguish three result-states of READING: positive, negative, and zero. A positive result-state means the agent assigns, with a greater-than-zero degree of certainty, token status to at least one MARK. A negative result-state means that the agent, with a greater-than-zero degree of certainty, decides no MARK has token status. A zero result-state means that for every MARK present the agent has zero certainty that it is or is not a token. We shall return to this important point shortly.

An agent READS at least MARK by MARK (though more usually by groups of MARKS at a time), assigns token status where possible, and thereby constructs a TOKEN-SEQUENCE. A TOKEN-SEQUENCE cannot be empty: it must contain at least one TOKEN. Under any one READING R a TOKEN-SEQUENCE is neither 'right' nor 'wrong': it simply is the sequence under that READING, irrespective of the degree to which it corresponds to any text present on the SURFACE.

The dependence relation here is strictly one way and is of transcription upon the TOKEN-SEQUENCE produced by the READING. If no READING (minimal or informed) takes place, If there is a TOKEN-SEQUENCE, and if an agent desires to preserve its corresponding TYPE-SEQUENCE in another place by the activity of transcription, then that TOKEN-SEQUENCE assumes the role of EXEMPLAR with respect to the transcription, and in that respect we refer to it as the E-TOKEN-SEQUENCE. The manifestation in another place of the preserved TYPE-SEQUENCE as a TOKEN-SEQUENCE produces a TRANSCRIPTION (as result) and we refer to that sequence as the T-TOKEN-SEQUENCE.[5] Should an agent wish to transcribe this T-TOKEN-SEQUENCE, the sequence would then assume the role of EXEMPLAR with respect to this second transcription.

The essential insight of the HSM model is that transcription is that the T-TOKEN-SEQUENCE represents and preserves the E-TYPE-SEQUENCE. It should be clear then that with respect to the painted rock faces the cave explorer does perform transcription, despite her inability to consciously assign token status to any of the MARKS. By means of the photograph and the drawing, each E-TYPE-SEQUENCE of the painted MARKS is preserved and manifested in another place as a T-TOKEN-SEQUENCE, perceptible as MARKS on a SURFACE. Transcription is always possible when the READING result-state is either positive or zero. It does not have to happen deliberately, consciously - a transcription can be produced quite by accident. The archaeologist's transcription, unlike the explorer's, comes from positive result-state READING and he produces different T-TOKEN-SEQUENCES from the explorer, but they all represent the same E-TYPE-SEQUENCE.

Transcription is possible from a zero result-state READING, but only possible. The cave explorer's photograph of the scratches, for example, is not a transcription because no TOKEN-SEQUENCE is present on the scratched rock SURFACE and thus there is no E-TYPE-SEQUENCE to preserve.[6]

An act of transcription necessarily involves an EXEMPLAR, but does not necessarily involve a DOCUMENT. In relation to a model of transcription, and the model of READING which is necessary for it, we say that when there is at least one TOKEN-SEQUENCE / TYPE-SEQUENCE on a SURFACE, and the same READING that assigned token status to the MARKS also assigns TEXT status to the TOKEN-SEQUENCE / TYPE-SEQUENCE, then the SURFACE + TEXT combination acquires DOCUMENT status.[7] In a majority of cases an agent READS a SURFACE either certain it is a DOCUMENT or at least believing that highly likely. Hence DOCUMENT is a term frequently used in discussions of transcription, but transcription can take place without any DOCUMENT being present.

[1] For work responding to and building on the HSM model, see Caton 2009, 2013b.

[2] I must assume the reader's familiarity with the basics of the HSM model, in particular with their use of Peirce's concepts of token and type. Huitfeldt and Sperberg-McQueen 2008 gives the initial exposition; Caton 2013 provides a summary.

[3] I am avoiding the words 'material' and 'persistent' because (for the purposes of this discussion) those adjectives are not yet well enough defined with respect to the digital domain. Despite my expressive clumsiness I hope the reader understands that I am opposing the nature of SURFACE and MARK to the essentially transitory, of-the-moment nature of a phenomenon such as speech.

[4] Obviously this differs from the normal usage, where we expect someone performing the activity of reading to recognize a specific token sequence and consider their reading incorrect if they don't.

[5] Strictly speaking there is no T-TYPE-SEQUENCE prior to the existence of the T-TOKEN-SEQUENCE, only the E-TYPE-SEQUENCE. The T-TYPE-SEQUENCE is a product of the T-TOKEN-SEQUENCE.

[6] Because a negative result-state involves conscious judgement, it is entirely possible for one agent to perform two different READINGS with different result-states: one negative (by looking at MARKS on a SURFACE and deciding that none has token status) and one zero (by also taking a photograph of the MARKS).

[7] Because this ties DOCUMENT to a particular SURFACE it means every DOCUMENT is a unique object and not a 'repeatable symbolic expression' as discussed in Renear and Wickett 2009. I consider this uncontroversial as a constraint for the purposes of modelling, but I believe it also reflects a core sense of the common usage. Certainly 'document' is often used in an abstract sense, as in 'Magna Carta is an important document', but the signification is strongly tied to the idea of a particular piece of paper (or stone tablet, parchment scroll, email, etc.).

Caton, Paul (2009). “Lost in Transcription: Types, Tokens, and Modality in Document Representation”. Presented at Digital Humanities 2009, University of Maryland, College Park, June 2009.

Caton, Paul (2013a). "On the term 'text' in digital humanities."Literary and Linguistic Computing 28 (2): 209-220. doi:10.1093/llc/fqt001

Caton, Paul (2013b). “Pure Transcriptional Markup”. Presented at Digital Humanities 2013, University of Nebraska, Lincoln, July 2013.

Huifeldt, Claus and C. M. Sperberg-McQueen (2008). "What is transcription?"Literary and Linguistic Computing 23 (3): 295-310. doi:10.1093/llc/fqn013

Huitfeldt, Claus, Yves Marcoux and C. M. Sperberg-McQueen (2009). "What is transcription? (Part 2)." Presented at Digital Humanities 2009, University of Maryland, College Park, June 2009.

Huitfeldt, Claus, Yves Marcoux and C. M. Sperberg-McQueen (2010). "Extension of the type/token distinction to document structure." Presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6, 2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). doi:10.4242/BalisageVol5.Huitfeldt01.

Renear, Allen H., and Karen M. Wickett (2009). “Documents Cannot Be Edited.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:10.4242/BalisageVol3.Renear01.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO