Authoring Scholarly Articles: TEI or Not TEI

  1. 1. Wendell Piez

    Mulberry Technologies, Inc.

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The TEI has grown and matured greatly in recent years, both in the number and breadth of its applications, and in their sophistication. It can be taken as a sign of the success and state of health of TEI to see persistent efforts to push its boundaries.

One area that is repeatedly cited as one where the TEI "should" provide a competitive alternative, but apparently does not, is the realm of authoring or original composition by scholars and writers.1 A closely related one is the interchange between authors and editors: the famous "submission format" issue. Thankfully, the ACH has already stated its support, in principle, for accepting conference papers in TEI format, and the supporting pieces — in the form of initial cuts at stylesheets for given TEI subsets — are fast getting put together.

The work of a few pioneers amply demonstrates that in the hands of an expert at XML systems and processing, TEI can indeed be a serviceable authoring format:2 it can clearly be made to fit the chore. Indeed, the widely used (and sometimes controversial) TEI Lite DTD is in many respects a balance of tags apparently chosen for their utility in authoring. The question is whether a set of declarations as provided by any P3- or P4-conformant DTD is a very strong approach to the need, as opposed to other thinkable alternatives. And at this point, there are several thinkable alternatives: there remains HTML or (for those who can assume the discipline) XHTML; there remain proprietary word-processor formats, which for all their shortcomings are still ubiquitous and well-understood; there is PDF, Postscript or TeX (still liked in some communities in the sciences). And now, there are alternative XML-based formats such as DocBook [DocBook 2003], the NCBI journal publishing DTD [NCBI 2003], and other "standard" (openly specified) XML technologies, designed either for authoring or for handling some problem close to authoring such as web site design and linking. Any technology, in order to gain acceptance, must be demonstrably better than all these, at least in some critical respects.

The Next Question Down: Are We on the Same Layer?

On the assumption that a solution, given present-day technologies, should be XML (and therefore be able to take full advantage of the wide range of available XML tools), it should be recognized at the outset that it is the element content models and element/attribute semantics in practice (at least insofar as they are captured by their containment or referential relations) that will ultimately make the difference for processing — not the particular names themselves (of elements or attributes). What the names are is certainly important, that is, but it is not the most important, insofar as an "alpha-renaming" transform is a trivial operation; it is where element-to-element mappings do not work due to incommensurable semantics (whether operational or implicit), that we run into problems.

A simple example is the way div elements work in HTML. Unlike TEI div elements, which enforce a deliberately strict containment hierarchy, div elements in HTML can be used for arbitrary formatting of what TEI would consider either structural divisions (true TEI divs) or block-level elements such as paragraphs, quotes, line groups, epigraphs, closers and what not. As an authoring format, therefore, HTML is both simpler and more flexible, and relatively more impoverished (which is to say, perhaps more difficult for some processing step down the line), than would be the TEI equivalent. The flexibility in this case is a false gift, effectively making HTML (unless some imposes some super-ordinate layer of modeling and validation over the top of it) useless for anything but display (to which these div elements are, after all, addressed).

A strong authoring format, therefore, would make the distinction between structural divisions and ad hoc block elements — even if that line is confessedly blurry, as it is at times, for the most part it reflects a distinction of enough practical use and importance, that modeling should be stricter than the bland HTML-style div.

We run into this problem because in TEI, div does not "mean" what it does in HTML; and it proves on examination in this case that the tighter TEI meaning may be more useful for an authoring schema.

The fact that these "div" elements share a name does not, that is, make these elements alike in any important respect. And even given that when we examine them closely, the question of which div an authoring format should prefer is easy enough to determine, it does not solve the general problem of "TEI or not TEI" — even given our conclusion in this case that the preference should be for TEI. Yet looked at from another angle (maybe looking at it sideways), in fact our answer is here. In fact, simply by stipulating that our putative tag set would be where possible TEI, but might in some cases need to bend to local exigencies — we have in effect already sequestered it, assigned to it a special position in our processing architecture different from TEI proper.

In other words, having recognized that in most practical cases, an authoring markup language will work at another layer from display formats, we should acknowledge that it might stand apart from other instances of TEI, where the tag set is "tuned" to an editorial or archival application. That is, we should take authoring in stride as another layer, requiring of us that we support transformations between this format and others that we might want (indeed, having recognized that some formats, such as HTML or a print-oriented format such as RTF or Quark, will inevitably be on separate layers in any case), but rewarding us for that extra investment of concern with a complementary flexibility — not only renaming of elements is possible, but the mapping of element relations as well.

Figure F1
[Figure 1: authoringarchitecture.jpg. A depiction of a processing architecture in which an XML document type for authoring is put to work in a system which may also use TEI for more orthodox applications. Not only will TEI have its own editing interfaces and its own range of conversion and display formats; but also, there will be times when converting between an authoring format and straight TEI is desirable, even while we also want to be able to go straight from authoring into output formats as a regular matter of course.]

Designing to the TEI Architecture

In fact the solution of implementing an authoring tag set "at a remove" in this way has already been endorsed, both in theory and in practice, by early implementors — such as OUCS or anyone who has written a stylesheet for TEI Lite. As early as four years ago (1999), Architectural Forms were already being proposed as a more consistent and more stable form of maintenance of TEI elements and their relations [Simons 1999] than literal declarations; likewise, Lou Burnard's examinations of the authoring question have identified TEI's usefulness not as a solution "out of the box", but rather as an architecture [Burnard 2001, slide 4]. By this mechanism, specific allowance can be made for altering tag structures to suit the needs of authors or web-site maintainers (for example), as opposed to editors, while likewise bringing from TEI that which is best about it, its "bones", one might say, along with the notion that if we can define, document and implement a consistent set of element structures within this framework, they can be made sufficiently close to TEI in spirit to satisfy our community's wish for a TEI orientation.

One way to approach the design problem is simply to formulate content models including useful element types straightaway, from the TEI architecture. This is the approach being taken by the strictly TEI efforts such as the OUCS web-site authoring format (see [Rahtz 2001]).

On the other hand, it is exactly here that it proves helpful to have a prototype of an authoring format ready to hand that is not TEI. A tag set, in other words, that is designed specifically with scholarly authoring requirements in mind, but which does not in itself draw from the TEI design, can be a very useful foil, helping us to abstract solutions to design problems away even from TEI itself. Fortunately, this approach, while it may at first blush seem radical, is quite in keeping with the application of TEI as an architecture. Moreover, the stresses here can be identified easily enough by assessing the simplicity and accuracy of whatever transformation logic is necessary to get us from one model to the other.

Since I have already developed such an authoring tag set for my own uses, this is the direction I am currently working in towards full TEI authoring. While I have not yet built this particular transformation, my expectation is that it will not be difficult — but that the more important findings of this five-year experiment in making a tag set from scratch, to order, may be not where its design can be readily recast as TEI, but rather where it cannot. My findings on this issue will be part of the paper I propose for ALLC/ACH 2004.


1. Cf. TEI-L, Martin Mueller October 13, 2003, and thread.
2. See the work of Sperberg-McQueen, Rahtz, and others.

1. [Burnard 2001] Burnard, Lou. TEI and XML: a marriage made in heaven? On line at
2. [DocBook 2003] Walsh, Norman, ed. DocBook home page. At
3. [NCBI 2003] National Center for Biotechnology Information (NCBI). Journal Publishing Document Type Definition. Home page at
4. [Rahtz 2001] Rahtz, Sebastian. Web Sites from TEI. Address to TEI Members' Meeting, Pisa, 2001. On line at
5. [Simons 1999] Simons, Gary, C.M. Sperberg-McQueen and David G. Durand. "Rethinking TEI markup in the light of SGML architectures". ACH/ALLC 1999 (Charlottesville VA). On line at

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info



Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

  • Keywords: None
  • Language: English
  • Topics: None