Delivering the Depths: Representing the Orlando Project's Interpretive Markup

paper
Authorship
  1. 1. Susan Brown

    University of Alberta, English and Humanities Computing - University of Guelph

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We begin with a brief review of the Orlando Project, a highly experimental effort to devise new ways of
writing literary history by use of encoding (Brown 1998). We argue that this encoding is “deep” in the sense
that we use SGML to tag not only structural but also intellectual properties of the text. In this paper we draw
out the implications of deep encoding so defined for the scholarly writer.
The Orlando team devised four DTDs to reflect the complexity of the material to be written. These
are used to compose “originally digital” text (Unsworth 2001) in the following forms: 1) discrete
chronological items; 2) longer topic items; 3) biographical documents; 4) documents about the writing careers
and publications of authors. The fourth and most complex of these, the “writing” DTD, is the focus of this
paper.
The writing DTD has 144 tags, which, in our more than 1200 documents of this type, are used with
33
frequencies ranging from 30,567 times (the <BIBCIT> or bibliographic citation tag) and only once (the
<DAY> tag). As these examples suggest, some tags resemble those of conventional markup, since we
conform to the TEI where possible. However, the desire of the team to encode the intellectual priorities of the
project and permit new ways of ordering and accessing literary historical documents resulted in numerous
unique tags, including the following, listed here with their frequency of use to date:
PLASTLITERARYACTIVITY 141
PLITERARYSCHOOLS 121
PMANUSCRIPTHISTORY 363
PMATERIALCONDITIONS 654
PMODEOFPUBLICATION 877
PMOTIVES 594
These tag names, and the various attribute names and values associated with some of them, are far
from transparent to the average person or even to someone schooled in the TEI. Nor are they transparent to
the average literary historian or critic. “Mode of publication”, for instance, might be understood by some
scholars to refer to aspects of the publication format (like folio or octavo) rather than or in addition to the
aspects of publication (like private printing or subscription) that the Orlando Project has singled out for
demarcation with this tag. This is a single example of how English studies has resisted attempts to standardize
vocabulary and method; identical terms operate differently in different contexts. Indeed, precisely because the
priorities and organizing principles of literary historical studies have often been left implicit, the Orlando
team believes that markup offers an innovative method of making crucial aspects of a study evident for debate
(Brown 2001b).
Each of the 65 unique writing DTD tags has thus been designated by project members as of crucial
importance in the production of our particular view of women's literary history. The Orlando Project tagset
thus interleaves familiar structural tags, such as <P>s with others, which we will call here “content tags”. The
project's markup is thus deep in the sense that it is designed to structure the history intellectually, to be a
“diagnostic and investigatory instrument” (Piez 2001, 197).
This tagging enterprise is experimental in a range of ways. The DTDs were designed to try to reflect
the conceptions—quite abstract, although derived from specific examples and much familiarity with women's
literary history—of scholars experienced in literary history but new to text markup. The DTDs were designed
not only without the traditional process of document analysis—since there were no pre-existing texts, nor did
the team wish to model the DTDs on print textuality—but largely without specific notions of how the content
markup would translate for delivery. In other words, the DTDs embody the abstract principles informing the
project as a whole, and the documents flowed from that. The DTDs were designed to permit complex
scholarly prose, which meant not only devising specific content tags, but allowing for flexibility within the
DTD, both in placement of tags and in their combination with each other. As a result, the writing DTD is
significantly flatter than most other DTDs, even including our own biography DTD, when one compares their
content tag hierarchies. This allows for considerably greater freedom in the structuring of documents, since
exclusive content hierarchies are only minimally enforced. Authors of texts must situate their writing within
one of four basic content tags:
1. <SUMMARY>;
2. <PRODUCTION>;
3. <TEXTUALFEATURES>;
4. <RECEPTION>.
Most of the other content tags are grouped according to one of the latter three categories. Thus, all the
tags with the “p” prefix discussed above, belong conceptually to the “production” group. However, the DTD
does not make use of those tags exclusive: each set of subtags is an inclusion of the <RECEPTION> and
<TEXTUALFEATURES> tags the as the <PRODUCTION> one, so that, for instance, the <PMOTIVES> tag
(with its attribute for MOTIVETYPE with attribute values of “Attributed” or “Self-identified”) is valid in the
midst of a discussion of a particular text's <RECEPTION>. This addresses the problem of the messiness of
actual critical prose, wherein one might be discussing the negative response of a critic who attributed certain
motivation to the author, or where her own statement of motive might be embedded in an assessment of her
own work. In effect, we incorporated overlapping hierarchies, because “the disorderly world of real data” is
not well represented by exclusive hierarchies, and making data fit a conventional model can lead to “an
impoverished view of the data.” (Durusau and O'Donnell).
The tagset is implemented in documents shaped as much by the markup language with which they are
written as by the literary historical research that informs them. This has resulted in a rich representation of our
data, notwithstanding differences in the application of some tags that result from the fact that they are, in fact,
34
“interpretive” (Butler 2000; Brown 2001a). In fact, the line between data and markup dissolves in the case of
Orlando texts, since the argument of the text can be as much in the markup as in the critical prose—hence,
again, it is deep. However, given the lack of consensus on method and vocabulary mentioned above, the
delivery of this material poses considerable challenges if the markup is to allow users to 1) exploit it for their
own purposes; and 2) examine the project's principles and priorities.
The danger of producing a massive alienation effect in a group of users often wary of new
technologies cannot be overemphasized. Even with our extensive in-house training and documentation, it
takes new student assistants at least a semester of part-time work (about 120 hours) to develop minimal
proficiency with the tagset, and often twice that time to develop the confidence to write and tag original
contributions. We cannot expect a similar time investment from our users.
The paper will demonstrate aspects of the Orlando Project delivery system that represent and educate
users in its deep markup. These include:
• Chronology searches that make the presence of a limited number of central content tags
known
• Organization of hyperlinking using content tags, to familiarize users with the structure and
conceptual organization of the DTDs
• “Thematic” pages introducing the power and specificity of searches on the tagset
• A full text search engine that offers a panorama of the tags and their relationships, and allows
searches to be constructed without knowledge of search syntax
• A full text search results screen that offers a user-friendly view of the content markup
users in its deep markup. These include:
REFERENCES
Brown, Susan, Sue Fisher, Patricia Clements, Katherine Binhammer, Terry Butler, Kathryn Carter, Isobel
Grundy, and Susan Hockey. 1998. “SGML and the Orlando Project: Descriptive Markup for an
Electronic History of Women's Writing.” Computers and the Humanities 31: 271–85.
Brown, Susan, and Isobel Grundy; with Renee Elio, Patricia Clements, Sharon Balazs, Rebecca Cameron,
Dave Gomboc, Allen Renear, Jeanne Wood. 2001a. “Intertextual Encoding in the Writing of
Women's Literary History.” ALLC/ACH 2001.
Brown, Susan, Isobel Grundy, et al. (2001b). Session of three Orlando Project papers: “The Hard and the
Soft: Encoding Literary History,” and “Risking E-Race-Sure/Erasure: Encoding Cultural
Formations.” Annual Digital Research in the Humanities Conference, School of African and Oriental
Studies, London University, UK, 9 July 2001.
Butler, Terry, Sue Fisher, Susan Hockey, Greg Coulombe, Patricia Clements, Susan Brown, S, Isobel Grundy,
Kate Carter, Kathryn Harvey, Jeanne Wood (2000). “Can a Team Tag Consistently? Experiences on
the Orlando Project.” Markup Languages Theory and Practice 2: 111–125.
Durusau, Patrick and Matthew Brook O'Donnell, “Overlapping Hierarchies/Concurrent Markup”
http://www.sbl-site2.org/Overlap/).
Piez, Wendell. “Beyond the 'descriptive vs. procedural' distinction.” Extreme Markup Languages 2001.
Online at: http://www.piez.org/wendell/papers/beyonddistinction.pdf.
Unsworth, John. “Publishing Originally Digital Scholarship at the University of Virginia” ACH/ALLC 2001.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2003
"Web X: A Decade of the World Wide Web"

Hosted at University of Georgia

Athens, Georgia, United States

May 29, 2003 - June 2, 2003

83 works by 132 authors indexed

Affiliations need to be double-checked.

Conference website: http://web.archive.org/web/20071113184133/http://www.english.uga.edu/webx/

Series: ACH/ICCH (23), ALLC/EADH (30), ACH/ALLC (15)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None