Nouns Proper and Improper: Using the TEI for primary sources

Julia Flanders; Sydney D. Bauman; Paul Caton

Authorship

1. Julia Flanders

Brown University, Women Writers Project - Brown University
2. Sydney D. Bauman

No affiliation given
3. Paul Caton

Brown University, Scholarly Technology Group - Brown University, Women Writers Project - Brown University, Centre for Computing in the Humanities - INKE Project, King's College London

Original URL

https://web.archive.org/web/20020713215142/http://www.cs.queensu.ca/achallc97/papers/s007.html

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Nouns Proper and Improper: Using the TEI for primary sources
Julia Flanders
Sydney D. Bauman
Paul Caton
Introduction
The TEI approaches the encoding of names as a problem having largely to do with the need to give labels to existing phenomena: Chapter 20, "Names and Dates", begins by saying that the elements provided therein offer the encoder "a detailed substructure" and the ability "to distinguish explicitly between names of persons, places or organizations" [P3, p. 583]. The elements offered in this section and elsewhere in the TEI are indeed sufficient to encode most if not all of the name-related phenomena found in the texts with which the Women Writers Project is concerned. However, this sufficiency on the SGML side of the equation does not assist with the other side: the fact that the encoder must in fact "distinguish explicitly" between the names of persons, places, organizations, mythical creatures, objects, and the like. That is, we must decide what the thing is before we can encode it, and this is not always easy.

The Women Writers Project
The WWP has an additional challenge, however, which is that in working with older texts we are confronted with a set of phenomena which the text itself identifies--by typographical emphasis of some sort--as being of linguistic or rhetorical importance. In texts printed in the 17th and 18th centuries, this set includes the elements discussed in Chapter 20, but also some related textual features such as abstract nouns and adjectives derived from proper nouns. Thus texts from this period themselves identify a set of features in which a scholar might well be interested, but which shade into one another and may be difficult to identify and classify with any certainty. For example, if one wants to distinguish names of persons from abstract nouns, one runs into challenges in the case of allegory or moral poetry, where virtues may be apostrophized as if they were human, or may be identified with a human agent, or may be in fact the name of that agent. Similarly if one wishes to distinguish names of persons from the names of other kinds of things (such as non-human creatures or objects) one needs not only the encoding equipment to label these but also a clear definition of what it means to be human. Test cases here might be the Medusa, the Minotaur, mermaids, centaurs, Niobe after her transformation into a stone, and the vexed question of the human status of various deities. An additional challenge arises in the case of adjectives which are derived from proper names, like Caesarian or Plutonic; for these there is not even a clear TEI element for the purpose, since <rs> is technically reserved for nouns.

Problems of Classification
Discussion of these issues often verges on the whimsical, for instance when one is forced to articulate what a "person" is (human from the neck up? able to speak or write? able to mate with humans?), but also engages with more serious issues concerning the nature of naming. If one does not wish to make naming and reference the centerpiece of one's encoding system (as would be appropriate for a text like the Metamorphoses, but not for an eclectic collection like the WWP's), one needs to draw a line between the category of names and the other things which shade into them: for instance epithets, vocatives like "Milady", or terms like "the Cockatrice" whose unique reference is vitiated by the presence of an article, and the whole range of apostrophes to abstract qualities like "Fair Virtue". Without such a line, it is hard to know where to stop, and the result is a huge set of features from which it is impossible to retrieve the information one wants. The natural response to this problem is to attempt to classify these in turn, for instance with type attributes, an approach which other projects (CURIA for instance) have taken with success.

The WWP has found, however, that for our texts it is very difficult to create a sufficiently comprehensive and unambiguous set of values to categorize these features in a way that would allow researchers actually to do systematic work on them. The WWP's path to this conclusion involved several attempts to create a system which could do justice to this complexity. We tried dividing our field of features into names of persons (using <persName> and its various components), names of non-persons (using <name>), and non-name references to both persons and non-persons (using <rs>). This last category was especially baroque, since it included the most heterogeneous group (abstractions, epithets, personifications of inanimate things, symbols, apostrophes, and references to mythical or imaginary creatures), and in fact each iteration of the classification process proved again that a substantial challenge lay in what to do with the residuum, the things which are only alike in being unlike some other, more clearly delimited category. The conclusion we found ourselves drawing was that although the concepts we were dealing with were fairly distinct, their application to specific textual phenomena was not by any means straightforward. Furthermore, although the components of the "residuum" were easily identifiable as categories which did not fit into the other two (personal and non-personal names), we were not confident that they represented categories which would be useful for scholarly study, though we were confident that trying to use them would prove to be extremely time-consuming and hence expensive. As a result, we eventually decided to use a simplified system which made no attempt to classify things beyond the element level; we now distinguish between personal names and the names of non-persons, and any other kind of reference which the text identifies as a proper noun is encoded using <rs> without a type attribute.

Conclusion
The conclusion which emerges from this attempt seems to be that despite the various provisions of the TEI for encoding these complex textual phenomena, the limiting factor really is human use and the ability to define and enforce categorization. The question which leads from this is one of how to regard and apply the TEI: if it is imagined as a system for accounting to one's own satisfaction for what one finds in the text, then the complexity available is essential. However, if the TEI is regarded as a method of communicating textual information to others, as long as the text itself is allowed to determine the encoding solution we will find this communication extremely difficult. Put another way, if an encoding project develops a TEI-based encoding system based on the assumption that its own data has unique requirements, that very assumption limits drastically the possibility of integrating that data with that of other projects to build larger resources, or the possibility of users being able to make common assumptions about how data will be treated. As a strategic matter, these possibilities are best kept open by the counterassumption that data can be treated similarly (even if that counterassumption is to some degree false). At this stage in the TEI's development, projects working on similar undertakings (similar materials, similar methodologies) have had the opportunity to discover the uniqueness of their own data and to revel in it, and they need to turn their attention to finding ways to share it. The ultimate goal of this session is therefore to discuss the degree to which this is possible, and the costs of doing so.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997

Hosted at Queen's University

Kingston, Ontario, Canada

June 3, 1997 - June 7, 1997

76 works by 119 authors indexed

Conference website: https://web.archive.org/web/20010105065100/http://www.cs.queensu.ca/achallc97/

Series: ACH/ALLC (9), ACH/ICCH (17), ALLC/EADH (24)

Organizers: ACH, ALLC

Nouns Proper and Improper: Using the TEI for primary sources

1. Julia Flanders

2. Sydney D. Bauman

3. Paul Caton

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997