Transcriptional implicature: a contribution to markup semantics

paper, specified "long paper"
Authorship
  1. 1. C. Michael Sperberg-McQueen

    Black Mesa Technologies LLC

  2. 2. Yves Marcoux

    Université de Montréal

  3. 3. Claus Huitfeldt

    Department of Philosophy - University of Bergen

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We may see, in a TEI transcription of an old book, the lines:

<pb n="[iii]"/> <p>Quaestiones, quae ad mathematicae fundamenta pertinent, etsi hisce temporibus a multis tractatae, satisfacienti solutione et adhuc carent.

What do they mean? How do we know what they mean? Can we model their meaning formally?
Formalizing the meaning of arbitrary natural-language utterances remains intractable today. But markup languages, being formally defined artificial languages, appear more approachable. So we may be able to explain what the <pb> and <p> elements mean, even if the sense of the Latin eludes formalization. Some propose to explicate the meaning of markup by specifying, for each construct in a markup vocabulary, a sentence schema in a natural language, with blanks to be filled in with data from the document 1; others make a similar proposal but allow sentence schemata in formal languages like first-order predicate logic as well 2. This appears straightforward, although far from trivial, for metadata 3and perhaps even for born-digital texts, but how shall the meaning of <p> be formalized in a markup language which defines it as containing a transcription of a text block in a manuscript? What does it mean for a document to be a transcription of another document? Can we formalize that?
Earlier work has explored the nature of the similarity between transcripts and their exemplars. Perhaps it consists simply in their containing the same sequence of characters? This can be formalized but proves disappointing, partly because it omits text structures like division into paragraphs and partly because it offers no way of describing disagreements among transcribers about how to read the exemplar, or which character distinctions (e.g. i/j, u/v, s/ſ) to retain and which to level. It is also wrong: few transcripts have exactly the same character sequence as their exemplar 4. Later work extends the analysis from characters to higher-level textual structures and models transcriber agreement and disagreement explicitly 5 6. (Paul Caton has built on this idea to propose pure transcriptional markup as an approach to the problems of transcription semantics 7.) By treating higher-level constructs as tokens of higher-level types and using the sentence schemata mentioned earlier, we believe one can formalize the meaning of tokens (characters, words, and XML elements like <p> and <pb> alike) in transcriptions.
Clearly, however, the meaning of tokens in a transcription depends on the transcription conventions adopted, which vary. There is hardly any universal transcription practice: for every generalization we find exceptions 8 9 1011 . Is everything in the exemplar transcribed? Not when deletions and irrelevant material are excluded. Does everything in the transcription reproduce some word or character in the exemplar? Not when line breaks are marked explicitly with vertical bars, or notes are added. Many scholarly editions account for variations like these in an explicit statement of transcription practice. Such statements typically describe ways in which practice varies from the usual practice, but rarely the ways in which it exemplifies normal practice. In any community of scholarship, some common practice is typically felt to be so obvious that it needs no mention or explanation.
One job of formalization is to make explicit practices and assumptions otherwise passed over in silence.
We propose a notion we shall call transcriptional implicature, denoting a set of rules which apply by default but which may be overridden in particular cases, analogous to the rules of conversational implicature proposed by H. P. Grice as a way of explicating the logic of everyday conversation12. The operational definition of transcriptional implicature for a given community is “the set of rules no one in the community bothers to mention explicitly”.
Different communities of transcription practice have different sets of tacit assumptions and thus different rules of transcriptional implicature. Is there a common core of transcriptional practice shared by all communities? Maybe; it's an empirical question. A serious answer would require detailed study of a wide variety of communities of practice. We postulate, however, that the transcriptional implicature of any community of practice can be described with reference to some default set of rules for transcriptional implicature.
The transcriptional practice of any given project is commonly documented by listing its deviations from the transcriptional implicature of the relevant community. If that transcriptional implicature can (as postulated) be described as a set of deviations from the default transcriptional implicature, then it follows that any project's transcription practice can be described with reference to the default transcriptional implicature, by merging the two lists of differences.
We propose to identify this hypothetical default transcriptional implicature with the rules outlined below. In the formalizations, “T” denotes a transcript, “E” its exemplar.
Adopting the extended use of the type/token distinction mentioned above, the default transcriptional implicature can be summed up in a single rule:
1. A transcript and its exemplar have the same type. Formally:
type(T) = type(E)
In interesting cases, E will have a complex type consisting of some structure of smaller types (which in turn consist of smaller ones still), instantiated by a complex token which similarly consists of smaller tokens. It is a consequence of (1) that:
2. There is a one-to-one correspondence between the tokens of a transcript and the tokens of its exemplar, such that every pair of corresponding tokens have the same type.
Formally, this is a second-order statement, but we can approximate it using the following first-order sentence, which assumes a function tokens which maps from a document to the set of tokens contained in that document, and the relations RET (mapping from tokens(E) to tokens(T)) and RTE (the other way round).
(∀ t1 : tokens(E)) (∃1 t2 : tokens(T)) (t2 = RET(t1)) ∧ (∀ t1 : tokens(T)) (∃1 t2 :tokens(E)) (t2 = RTE(t1)) ∧ (∀ t1 : tokens(E), t2 : tokens(T)) (t2 = RET(t1) ⇔ t1 = RTE(t2) ∧ type(t1) = type(RET(t1)))
It is easier to relate variations in transcription practices to the default transcriptional implicature if we paraphrase (2) as a conjunction of simpler rules (3) - (6):
3. For every token in the exemplar there is exactly one corresponding token in the transcript.
(∀ t1 : tokens(E)) (∃1 t2 : tokens(T)) (t2 = RTE(t1))
Applied to the example with which this document begins, this means: each token in the exemplar maps to a token in the transcript. We can infer that the sentence quoted does not contain the word non, because otherwise non would appear in the transcript.
4. For every token in the transcript there is exactly one corresponding token in the exemplar.
(∀ t1 : tokens(T)) (∃1 t2 : tokens(E)) (t2 = RET(t1))
Applied to the example: each token in the transcript maps to some token in the exemplar. We can infer that the exemplar contains some token corresponding to the word Quaestiones in the transcript.
5. The relations identified in rules (3) and (4) are inverses: that is, for every pair of tokens t1 in the exemplar and t2 in the transcript, if t2 corresponds to t1 as described in rule (3), then t1 corresponds to t2 as described in rule (4).
(∀ t1 : tokens(E), t2 : tokens(T)) (t2 = RET(t1) ⇔ t1 = RTE(t2))
6. In every pair of corresponding tokens, the two tokens are tokens of the same type.
(∀ t1 : tokens(E)) (type(t1) = type(RET(t1)))
Applied to the example: we can infer that the token in the exemplar which corresponds to the word Quaestiones in the transcript is itself a token of the same word type.
We observe that many, perhaps all, variations in transcription practice can be classified by which rule they override.
Silent expansion of abbreviations and normalization of spelling exclude some individual characters in E and T from the scope of rules (3) and (4); those rules typically still apply to words and higher-level tokens.
Expansion of abbreviations in brackets can preserve the character-level mapping, but introduces characters in T which are exceptions to rule (4), since they lack corresponding characters in E. The use of vertical bars (|) in T to record line breaks in E is also an exception to rule (4).
Omission of selected material (deleted words, additions from a second hand, ...) modifies rule (3) by identifying tokens in E which are not represented in T.
Both transcriptions which distinguish archaic allographs (i/j, u/v, s/ſ) and those which level non-graphemic distinctions obey rule (6), but they read the document with different type systems.
The principle of charity (“in cases of doubt assume E is correct”) can also be interpreted as a further elaboration of rule (6).
The full paper will explore these and further ways in which the practice of transcription can deviate from the default rules of transcriptional implicature as we proposed to define them, and show how the variations in practice can be described formally.
References

1. Marcoux, Yves (2006). A natural-language approach to modeling: Why is some XML so difficult to write? Paper given at Extreme Markup Languages, Montréal. Proceedings of Extreme Markup Languages 2006. On the Web at conferences.idealliance.org/extreme/html/2006/Marcoux01/EML2006Marcoux01.html.
2. Sperberg-McQueen, C. M., Claus Huitfeldt, and Allen Renear (2001). Meaning and interpretation of markup. Markup Languages: Theory & Practice 2.3: 215–234. On the Web at cmsmcq.com/2000/mim.html
3. Wickett, Karen M., and Allen Renear (2009). A first order theory of bibliographic objects. Proceedings of the American Society for Information Science and Technology 46.1: 1–8. On the Web at onlinelibrary.wiley.com/doi/10.1002/meet.2009.1450460378/full (subscription required).
4. Huitfeldt, Claus, and C. M. Sperberg-McQueen (2008). What is transcription? Literary & Linguistic Computing 23.3: 295-310.
5. Sperberg-McQueen, C. M.. Claus Huitfeldt, and Yves Marcoux (2009). What is transcription? Part 2. Talk given at Digital Humanities, College Park, Maryland. Slides on the Web at blackmesatech.com/2009/06/dh2009.
6. Huitfeldt, Claus, Yves Marcoux, and C. M. Sperberg-McQueen (2010). Extension of the type/token distinction to document structure. Paper presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6, 2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5. doi:10.4242/BalisageVol5.Huitfeldt01. On the Web at www.balisage.net/Proceedings/vol5/html/Huitfeldt01/BalisageVol5-Huitfeldt01.html.
7. Caton, Paul (2013). Pure transcriptional encoding. Paper given at Digital Humanities 2013, Lincoln, Nebraska.
8. Carter, Clarence E. Historical editing. Bulletins of the national archives, Number 7. [Washington, DC]: National Archives and Records Service, August 1952. National Archives publication number 53-4.
9. Tanselle, G. Thomas (1989). A Rationale of Textual Criticism Philadelphia: University of Pennsylvania Press. 104 pp.
10. Vander Meulen, David, and G. Thomas Tanselle (1999), A system of manuscript transcription Studies in Bibliography 52: 201-212.
11. Robinson, Peter, and Elizabeth Solopova (2006), Guidelines for Transcription of the Manuscripts of The Wife of Bath's Prologue. 18 March 2006. On the Web at www.canterburytalesproject.org/pubs/transguide-MI.pdf.
12. Grice, H.P. (1975). Logic and Conversation, Syntax and Semantics, vol.3 edited by P. Cole and J. Morgan, Academic Press. Reprinted as ch.2 of his Studies in the Way of Words (Cambridge, Mass.: Harvard University Press, 1989), pp. 22–40.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO