Graduate Institute of Applied Linguistics
Computer Center - University of Illinois, Urbana-Champaign
As the TEI Guidelines explain, the target uses of the DTD demanded that it be possible to extend or otherwise modify the DTD: "The document type declaration provided by the TEI is intended to cover as wide a variety of document types and processing needs as proved feasible. It is impossible, however, for any finite list of text elements to cover every need of textual research and processing. As a result, extension of the TEI DTD has no effect on strict TEI conformance, as long as certain restrictions are observed." [SMB94, Section 28.5.3] Consequently, the guidelines devote one chapter to the issue of TEI conformance and another to mechanisms for modifying the DTD in a conforming manner. This paper first reviews the TEI approach to DTD modification and conformance, and then proposes an alternative approach based on architectures.
The original TEI approach
A modified TEI DTD is TEI conformant if it meets two basic requirements: (1) all modifications are documented in a prescribed way, and (2) all modifications are made in the DTD subset of the document (that is, the actual TEI DTD files may not be modified). To support DTD modification via the DTD subset, the TEI DTD was implemented using an ingenious system of parameter entities. Overriding the definition of these parameter entities in the DTD subset serves to modify the DTD. In short, virtually any change (including wholesale redefinition) is conformant, as long as it is done using the prescribed mechanisms. Such a liberal view of conformance is probably troubling to most. The guidelines partially address this in section 29.1 by defining two classes of modifications: "A modification is clean if the set of documents parsed by the original DTD may be properly contained in the set of documents parsed by a modified DTD, or vice versa." On the other hand, "A modification is unclean if the set of documents parsed by the original DTD overlaps the set of documents parsed by the modified DTD with neither being properly contained in the other."
Using architectures to derive new DTDs
SGML architectures provide another strategy for creating modified DTDs. Instead of changing a DTD, one builds a new DTD that is formally derived from the original. As the preceding paper in this session demonstrates, the TEI DTD can be successfully used in this way. In the terminology of architectures, the base DTD is called the architectural DTD and the derived DTD is called the client DTD. Each element in an architectural DTD is called an architectural form. The client DTD is derived from the architecture by mapping each of its elements onto an architectural form; this is done by means of the architectural form attribute.
An architectural approach to conformance
The TEI DTD was developed before the notion of SGML architectures was generalized. Had architectures existed, the TEI could have avoided devising its elaborate system of extension by adopting an architectural approach to conformance. The TEI notion of original DTD would correspond to the architectural DTD and the TEI notion of modified DTD would correspond to the derived client DTD. A client DTD would be TEI conformant if it declared the TEI DTD to be its base architecture. Clean and unclean conformance would then be defined as follows:
A document conforms cleanly to its base architecture if its corresponding architectural document is valid with respect to the architectural DTD. A derived DTD conforms cleanly to its base architecture if every document that is valid for that DTD also conforms cleanly to the base architecture.
By contrast, a document conforms uncleanly to its base architecture if its corresponding architectural document is not valid with respect to the architectural DTD. A derived DTD conforms uncleanly to its base architecture if there is at least one document that is valid for that DTD but which does not conform cleanly to the base architecture.
It turns out that every case of conformance that is clean by the architectural definition is also clean by the original TEI definition, but the reverse is not true--there are cases considered clean by the TEI approach that are not clean by the architectural approach. The net result is a "cleaner clean" in which the set of possible client documents always maps (through architectural processing) onto a subset of all possible architectural documents.
Automatically validating conformance
This architectural approach to defining clean conformance has a major advantage over the TEI approach, namely, the SGML parser can formally test clean conformance for any user document. By simultaneously validating a document against its own DTD and its architectural DTD, clean conformance is achieved when no errors are reported for either DTD. When a document is valid against its own DTD, but generates errors with respect to the architectural DTD, then its conformance is unclean.
This approach does have a major weakness, however. The SGML parser can only verify that a particular document instance conforms to the architecture; it cannot verify that the derived DTD conforms to the architectural DTD. For a case in which there is a closed set of data files all of which can readily be validated against both DTDs, this limitation does not pose a problem. However, in an open-ended case where a run-time validation error could bring production to a halt, this limitation could be a serious one.
To solve this problem, we need a new tool that compares a derived DTD to its architectural DTD to determine if it conforms cleanly; if not, the tool should report why not. The full paper discusses the formal language theory that lies behind such a tool, presents an algorithm for making the comparison, and describes our results to-date in implementing such a tool.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at University of Virginia
Charlottesville, Virginia, United States
June 9, 1999 - June 13, 1999
102 works by 157 authors indexed
Conference website: http://www2.iath.virginia.edu/ach-allc.99/schedule.html