Beyond the Web: TEI and the Ebook Revolution"

  1. 1. Matthew Gibson

    Libraries - University of Virginia

  2. 2. Christine Ruotolo

    Libraries - University of Virginia

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


From August through November 2000, the Electronic Text Center at the
University of Virginia delivered over a million freely-available
electronic books to patrons in over 100 different countries. Distributed
in a variety of formats, including .lit, .pdb, and .pdf, these ebooks have
provided proof-of-concept for the adaptive uses of TEI standards beyond
the World Wide Web -- standards that the Electronic Text Center has
employed since its inception in 1992. In this presentation, we will
discuss the mechanics of our ebook production and the conversion workflow
we hope to implement in the near future. We'll also talk about the user
response to our ebook collection, and the advantages and disadvantages
that different formats offer to scholars and instructors in the

For the purposes of this paper, an ebook is defined as an electronic
full-text resource designed to be read on a screen, in something other
than a web browser. Thus an ebook can be read on a PC, a laptop, a PDA,
or a dedicated reading device, in one or more of the growing number of
available formats and software applications.

*Methods of Ebook Production*

In its first phase of ebook production, the Etext Center repackaged a
portion of its TEI-encoded collection as .lit files for use with the
Microsoft Reader. Although the Reader is a proprietary piece of software,
it is compliant with the Open E-Book (OEB) format, an XML-based standard
to which TEI data can easily be adapted. Using simple Perl scripts, we
automated the conversion of over 1,500 existing TEIXLITE files into
extended OEB, which allows most of the original tagging to be preserved
and accommodated with stylesheet instructions. This conversion gave us a
body of core ebook documents that we could repackage, through the use of a
piece of commercial software, into the Reader format. Later, with some
simple adjustments to the conversion scripts and the stylesheets, we were
able to output our OEB files to the .pdb format for the Palm system and
the .pdf format for the Adobe Reader.

At the moment all of our ebook files are static objects on the Etext
server. However, with Xhub, the conversion application described and
proposed by the Scholarly Technology Group at Brown*, in mind, we are
working both to expand the number of formats we can process from SGML/XML
content and to create those formats on-the-fly. Because the automation
for .lit and Palm systems is already in place and we are about to begin
dynamically transforming SGML/XML to PDF, the public auto-conversion
interface to generate ebooks from TEI data is imminent. Ultimately, we
envision a delivery system where visitors to our website can choose to
view and search our texts through the traditional web interface, or
download them instantly in an ever-growing number of ebook formats.
Patrons will have more control than ever before over the way they access
and use our materials.

*User Response, Statistics, and Feedback*

Like similar TEI-based text repositories, Etext has prided itself on the
usefulness of its encoded data for sophisticated searching and text
analysis. Traditionally, though, we've considered issues of aesthetics,
design, and interface to be of secondary importance. Our work with ebooks
represents a new focus on the technologies of reading and how they impact
our patrons. Early analysis of user statistics for our ebooks indicates
that, when users are given the choice between a downloadable MS Reader
version of a text and a web-delivered XML/HTML version, they choose the
former by a margin of about 2 to 1. As we make additional ebook formats
available from our website, we will conduct careful analysis of usage
patterns, with a particular eye to how format preference varies among
individual titles or content categories. This analysis should prove useful
to academic institutions and commercial publishers alike, as we are
unaware of any substantial analysis of ebook usage patterns that has yet
been published.

*Pedagogical Challenges*

In converting richly-encoded TEI documents to ebook format for classroom
use, we provide students with the advantages of portability and a
user-friendly interface. However, the current ebook platforms limit the
utility of these texts because they are not SGML/XML-aware and do not
support, for example, the kinds of hierarchical searching and analysis
that TEI markup allows. In our classroom pilot projects, we are therefore
searching for implementation solutions that combine the functionality of
encoded text with the ebook's ease of use. For example, we are using the
original TEI texts to create stand-alone indices of all materials related
to a particular course. Students can then perform web-based searches that
take advantage of the markup and metadata, but will have the option of
retrieving their results in the ebook format of their choice.

As ebooks provide instructors with more control over the presentation of
classroom materials, we are recognizing the importance of working closely
with them to determine the optimal format for their purposes, as different
formats facilitate different types of scholarship. For example, raw page
images loaded into a PDF-based ebook reader would have little utility for
a scholar doing linguistic analysis. For an instructor interested in the
visual impact of book layout and typography, however, this presentation is
preferable to full-text transcription and encoding. Thus we find that we
can't allow our own standard practices or assumptions about humanities
computing to limit the range of presentation options we offer to our
patrons. Even within the scope of a single course, a variety of textual
formats may be needed to meet the instructor's pedagogical goals.


Since it was established, the Electronic Text Center has maintained its
two-fold mission of building SGML-based content while simultaneously
educating and serving the community that will use this content. We see
ebook production as an important part of both the research and public
service aspects of our mission. As methods of delivering content change,
and user expectations change with them, we must adapt to these changes and
incorporate them into our existing workflows. Furthermore, we hope that
our presence the ebook world will, in some small way, help to foster a
commitment to structured data and open standards in an industry which is
increasingly dominated by big corporations and proprietary interests.

* See
for a discussion on Xhub that Elli Mylonas and Carole Mah from STG gave at
the Extreme Markup Languages 2000 Conference, Montréal, Canada, Aug.
15-18, 2000.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at New York University

New York, NY, United States

July 13, 2001 - July 16, 2001

94 works by 167 authors indexed

Series: ACH/ICCH (21), ALLC/EADH (28), ACH/ALLC (13)

Organizers: ACH, ALLC