Critical editing with TXSTEP

poster / demo / art installation
  1. 1. Wilhelm Ott

    Universität Tübingen (University of Tubingen / Tuebingen)

  2. 2. Tobias Ott

    Stuttgart Media University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In the "Afterword" to his 1984 edition of James Joyce's "Ulysses", Hans Walter Gabler gives a short outline of how he collected the variant readings contained in the different sources, how he used them for establishing, in two steps, the critical text, leaving most of the mechanical work (even the automatic insertion of the diacritic marks for the genetic variants) to the computer, and checking, by subsequent machine collation, the manual work which was carried out interactively at the computer console. According to Gabler, "the systematic and comprehensive reliance on computer aid ... has drastically reformed the editing process... Without it, this edition would neither be as accurate as we hope it is ... nor so rich in recorded facts" (p. 1909).
The TUSTEP tools Gabler had used more than 30 years ago have constantly been adapted in close collaboration with many editorial projects to their respective requirements and to changing technologies like PostScript and PDF for output or encoding standards like SGML, XML, TEI and Unicode. They have successfully been used for the preparation of many other critical editions; lists more than 750 volumes of printed editions published between 1972 and 2013 prepared and/or typeset with TUSTEP. They include works written in languages using non-latin alphabets, like greek (e.g. the 28th edition of Nestle-Aland, Novum Testamentum Graece, published in 2012), hebrew (e.g. the Mishna edition published by Michael Krupp) and arabic (Kitab al-Adad al-musamma..., ed. 2012 by Gunhild Graf). Current editorial projects relying on these tools include the works of Marx and Engels, the letters of Philipp Melanchthon, the works of Christoph Martin Wieland, of Albertus Magnus, the philosophical works of Gottfried Wilhelm Leibniz and many others.
The TEI wiki judges the use of TUSTEP for the preparation of critical editions as follows:
Advantage: does the job
Drawback: very difficult to learn.
According to Willard McCarty (Humanities Computing 2005, p. 217), main reasons for these difficulties are the language of documentation and the complexity of the interface.
This mentioned drawback has in the meantime lost much of its impact:
At DH 2012, we presented the prototype of a modern XML-based interface to these tools, called TXSTEP. It both removes the language barrier and provides an user interface which an up-to-date established syntax. It allows the user to take advantage of the typical benefits of working with an XML editor, like content completion, highlighting, showing annotations, and verifying the code. The underlying XML schema contains extensive annotations and documentation on the purpose and syntax of the single functional elements available for building a TXSTEP script. When using a modern XML editor like oXygen, these annotations are shown automatically in a popup window while developing a TXSTEP script, so offering to a considerable degree a self teaching environment.
The poster session demonstrates how to use these tools for supporting the single steps required for the preparation of a critical edition:
Collating witnesses / collecting variant readings
Evaluating the collation results
Constitution of edition text
Compilation of apparatuses
Preparation of indexes
Preparation of printer's copy
Publishing the text with appartus(ses) in print and/or for the web
As text basis for this demonstration, we chose a freely invented scenario: in order to have available a short example showing the whole spectrum of different types of variant readings, we copied a passage from vol. 4 of the edition of the works of Friedrich Schelling, a German philosopher, which had been typeset with TUSTEP in 1988, and labelled it as „version A“. In addition, we invented two other witnesses B and C for this same passage by copying it to separate files, there carrying out systematic replacements of single characters (so, the initial upper case Umlauts which in the 1988 edition have been written as Ae, Oe, Ue, have been converted to Ä, Ö, Ü) and other orthographic corrections (e.g., replacing th by t, y by i, c by k), changed the punctuation (inserting or deleting a comma etc.) and, in addition, made other modifications by inserting / omitting / replacing / transposing whole words or sentences. Starting from this material, the demo will show how the preparation and publication of a critical edition, be it in print or on the web, can profit from the power of TXSTEP – which provides powerful tools not only for editing but also for text analysis, for indexing, for lexicographic or bibliographic work and for publishing the results.
As a first step of the editorial work, we have to collect the variant readings by comparing the text of the sources, using the TXSTEP module COMPARE and specifying the details needed for our purpose (word-by-word collation; full references for the lemma location; abbreviate lemmata comprising more than 5 words; add 1 word of context for insertions; include a version-id for version B), as shown in fig. 1.

Fig. 1: COMPARE script
COMPARE compares two files only and produces two different kinds of output: a "comparison protocol" or listing which may be displayed on screen or printed on paper, showing the text of version A and below that the text of version B. Differences between the two versions are marked between the two lines. This listing (see fig. 2) is useful for checking the results of a correcting step or of automatic transformation of a text.

Fig. 2: COMPARE protocol
For the preparation of critical editions, a different kind of output is more useful (see fig. 3): the differences found by comparison are written to a text file in a syntax which contains all the information necessary for further processing and merging into them the differences found by comparing the same version A to the text of other witnesses.

Fig. 3: differences file with TEI compatible tags
A listing generated from the difference files produced by comparing the text of more than two witnesses is shown in fig. 4.

Fig. 4: Listing showing differerences of more than two witnesses
As the next step, shown in fig. 5, we try to classify the variants found. We try to differentiate between the various types of variants mentioned above when describing our text basis: we write variants concerning initial umlauts only to a separate file which we decide that its content will not be part of the apparatus, but will be handled in the preface. The merely orthographic variants and the variants concerning different punctuation only will get a respective attribute which allows to either also omit them from the apparatus, or to list them in a separate apparatus level. The remaining variants are those which should be listed in the main apparatus.

Fig. 5: classifying variant readings
The result of this step are two files: one containing the variants concerning (in our case) the initial umlauts only, and a second file containing the raw material to be used for building the apparatus entries. Fig. 5 shows the procedure for the differences between version A and B only (lines 17-122). There follows the same <transform> for the differences between version A and C, and a further module for sorting and merging the variants found in version B and version C in ascending order of lemma location, of type of variant, and of witness ID.
When we transform these sorted and merged records of variant readings automatically, without philological inspection and revision, into apparatus entries for a printed critical edition and insert them into our edition text (for which, in this demo, the unaltered text of version A will serve), we get a file of which a detail is shown in fig. 6.

Fig. 6: edition text (version A) with inserted apparatus entries
With a further script (not shown here), making use of the powerful typesetting engine of TUSTEP, this file is transformed to a PostScript or PDF file (see fig. 7), showing at page bottom the apparatus entries linked to the text by means of line numbers printed in the margin of the edition text. As stated above, in this example, we omitted only the variants concerning the writing of the initial upper case Umlaut only; for the other variants, three apparatus levels have been provided, the first one showing the variant readings which may affect the meaning or interpretation of the text; merely orthographic variants are listed in the second apparatus, the third apparatus contains variants regarding punctuation only.

Fig. 7: typeset Edition with 3 apparatus levels at page bottom
Of course, without investing further philological effort, the results are less than satisfying, as shown in fig. 7 for lines 13-14: the text has "kleine engherzige", versions B and C each showing "engherzige kleine". The apparatus says that, in line 13, version B and C omit "kleine" and insert, after "engherzige" ending in line 14, the word "kleine". The reason is that the word-by-word comparison has produced this result. In the apparatus, the two entries should however be transformed into a single one, showing an inversion of the two words "kleine engherzige" to "engherzige kleine".
This means: the procedures shown in the demo can only make available a reliable material basis for the philological work indispensable for responsible critical work on an edition. At the same time, the scope of work of these procedures can in every detail be specified according to the needs of an actual editorial project, thus saving time for manual work and at the same time providing a reliable material basis for controlled work.
Fig. 8 shows an example of the same text presented as an online edition (also generated automatically, without critical intervention) in html form. In the left column, we have the edition text (version A in our case), highlighting the words where one of the other versions (B and C in our case) show differences. A click at one of the highlighted words makes visible the respective location in the apparatus frame below the text, where a click at the witness code opens, in the right hand frame, the text of the selected version and shows it from the location containing the respective variant reading.

Fig. 8: edition in html frames
Also the right-hand frame, in this case presenting version B, shows an apparatus frame containing however the differences to version A only and not to the other witnesses. Also here, the above mentioned inversion "kleine engherzige" vs. "engherzige kleine" is not marked as an inversion: "um" - though identical to the text in version A - is highlighted because version A has inserted "kleine" after "um", and "kleine" is highlighted because it is missing after "engherzige" in version A.
Of course, editorial work cannot rely on the steps only which are shown in this demo. Alphabetic lists of word forms occurring in the single witnesses, e.g., or sorting the lemma-variant-pairs alphabetically could help to reveal e.g. spelling variants typical for certain geographical regions where a witness has been transcribed and e.g. help to build gropups of witnesses for texts with a great number of witnesses. Also for these and similar task, TXSTEP provides not ready-made solutions but a set of tools for relatively elementary steps of text data processing – tools whose scope of work can be specified in detail according to the needs of each step, and which can be combined in (almost) arbitrary ways to provide solutions also for complex tasks. A list of the basic modules is shown in fig. 9, showing the popup window if you put the cursor in the root tag of a TXSTEP script (in this case, I did this using the script shown in fig. 1).

Fig. 9: showing, as popup, the modules provided by TXSTEP
Both TXSTEP and TUSTEP, the TUebingen System of TExt Processing tools, are open source under the Revised BSD License and can be downloaded from the TUSTEP homepage[6]. The TXSTEP installation package contains in addition a set of 80 exercises, covering tasks like file transformation and extraction of information, collation of different versions of the same text, evaluation of collation results, index generation and sorting.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO