Computing Service - Oxford University
The Text Encoding Initiative DTD is usually classified as descriptive,
suitable for encoding digital versions of existing books, or
for creating specialized publications such as dictionaries. Less
commonly, it is used for academic books and papers, and original
documents such as the TEI Guidelines themselves. In this paper, we
address the problems of using TEI markup in an even less obvious domain,
that of `normal' Web sites, using
the Oxford University Computing Services http://www.oucs.ox.ac.uk
site as an example. Some examples are also given from the TEI web site
itself. We cover:
- the conversion of existing HTML documents (c.6000) to TEI XML
- the (small) extensions to the TEI DTD which are needed, and usage notes
(eg uses of `rend' and `type' attributes)
- the development of a comprehensive, flexible, set of XSLT specifications to
convert the TEI XML documents into a linked web tree
- consideration of management issues
Why choose the TEI DTD for a web site? The most important reason is
that we can harness experience with TEI paragraph-level markup, but
another argument in favour of TEI is the mature metadata support in
<teiHeader>. Using this allows us the option of converting to RDF
later on (since teiHeader should be as rich), without the
unfamiliarity of new elements.
The conversion process is typical of an exercise that will have to be
carried out my millions of people over the next few years; cleaning up
bad HTML is a well-understood process, but interpreting the result as
TEI is not always easy (structured divisions present special
problems). Less obviously, the conversion process often involves
manual stitching together of a set of HTML files into a single TEI
document for much easier editing and maintenance.
The TEI DTD proves (perhaps surprisingly) perfectly suitable for
general web pages. Some of the extensions needed are:
- the standard TEI Lite extensions
- addition of short-cut attributes to <xptr> and <xref> to allow
URLs directly (rather than via entities)
- similarly, provision of `file', `scale', `width' and `height'
attributes to <figure>, for practical authoring
- addition an element <email> to the allowed contents for <address>
- addition of MathML as the content for <formula>
As with any TEI project, we need to build up a repertoire of `rend'
and `type' attribute values for various elements. These include
- `fancy' and `doublespace' types for lists
- `new' and `noframe' rend attributes for <xref> and <xptr>, to
specify links which must start a new window, and escape from a
- `code' rend attribute for <hi>, to mimic HTML's <b><code>
- `interpret' type attribute for <xptr> to support transclusion
but of course most visible web effects are confined to the
stylesheets. XML processing instructions are used to generate some of
the HTML <meta> elements. It is likely that the tables will need more and more
`rend' support in future, and this is the most likely area where we
would drop TEI in favour of another table schema.
The greatest amount of work is in writing XSLT stylesheets to render TEI
to HTML (either dynamically on the web browser client, or on the web
server). The results are coupled with CSS stylesheets, but in this
system at present CSS is relegated to a fairly minor role, since we
need considerable amounts of the transformation for which XSLT is well
suited. Most obviously, we often want to convert a single TEI document
into a set of HTML web pages, but there are many other examples of
generated or rearranged text. The resulting set of XSLT specifications
(over 3000 lines of code) is notable in three ways:
1. It makes heavy use of the `import' feature of XSLT, allowing for
modular and cascading stylesheets; a group of pages can easilu have
their own wrapper around the main stylesheets, and a particular page
can have its own wrapper around the group one.
2. There are over 60 points identified where the result is
parameterized, allowing for simple overrides in a wrapper
file. These cover everything from the words used for `Next page',
through the depth at which <div> elements produce new pages, to a
switch which generates an HTML frameset presentation.
3. There is a web form which allows a new user to derive an XSLT
wrapper around the stylesheets, in a manner analogous to the TEI
Pizza Chef, without knowing very much about XSLT. More experienced
programmers can override any aspects of the stylesheets, obviously.
The paper shows examples of how the same document can be presented
on the web in a variety of ways by minor changes to the XSLT
The management of web pages is always an issue, whatever
authoring system is used. We prefer to use a conventional change
management system which interacts with the <teiHeader>, and provides
plenty of flexibility for controlling a multi-author environment.
In conclusion, this paper demonstrates that authoring static web pages
in the TEI is reasonable, and that XSLT stylesheets provide a powerful
tool for manipulating them.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at New York University
New York, NY, United States
July 13, 2001 - July 16, 2001
94 works by 167 authors indexed
Affiliations need to be double-checked.