TEI and Topic Maps

  1. 1. Christian Wittern

    Chung-Hwa Institute of Buddhist Studies, Institute for Research in Humanities - Kyoto University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Over the last few years, a large number of texts and other resources have been encoded using markup according to the TEI guidelines1 and other document type definitions. While the TEI provides both a comprehensive theoretical framework for text encoding and a practical guide to its appliance to individual texts or even text groups, there is little to find that goes beyond the encoding of individual documents and gives advice at how to proceed with an analytic framework that might help encode not just concrete features of a text, but also aspects of the world as presented through the encoded texts.

With the growing number of encoded material, the absence of more abstract ways of encoding what a text talks about becomes all the more felt. There is so much interesting use these texts could put to, but so little to work with.

Topic Maps

One of the attempts of providing a path to a solution to this problem, or maybe more accurate, a way to approach this problem, has been the work that culminated in the ISO standard 13250, SGML Topic Maps2. This standard provides a model and architecture for the semantic structuring of information networks. It has the potential to provide a bridge between the information as contained in texts encoded with schemes like the TEI, other information resources and information about the world in general.

A good general introduction to topic maps is available in Steve Pepper, Euler, Topic Maps, and Revolution, in: Proceedings of XML Europe 99 Conference, GCA, Alexandria, VA, 1999, Steve Pepper, Navigating Haystacks, Discovering Needles, in: Markup Languages, 1999 and Hans Holger Rath and Steve Pepper, Topic maps at work, in: Charles F. Goldfarb and Paul Prescod (eds):XML Handbook, 2nd edition, Prentice Hall 2000.

The topic maps standard describes a model and interchange format for topic maps. A topic map is an SGML/XML document3 in which different element types are used to represent topics, occurrences of topics, and relationships (or `associations') between topics.

Application of topic maps to texts of the Chinese Chan-School

In this project, I am trying to use a topic maps mechanism to encode some information in a number of 10th to 13th century Chan-chronicles. The texts have already been marked up according to the TEI guidelines. With the TEI markup, the basic structure of the documents and features like names of persons and places, datable events and the like have been marked. Topic maps are now used to encode the following features in the text:

Topics and occurrences of Chan-masters and other persons, places, datable events
Quotations and allusions occurring in the text
Interjections and comments of later Chan-masters relating to a given anecdote
Information about the lineage of Chan-masters
Other external information related to these topics
Links to other resources relevant to topics occurring in the texts
In addition, a typology of anecdotes and their instances is in preparation. This typology will also be exposed as a topic map. This will allow to more easily trace and visualize the development and relationship of topoi in the anecdotes.
Preliminary results

As can be seen, the research reported in this paper tries to go beyond the encoding of individual texts and tries to overlay a layer of abstraction on it, that will allow the exploitation of this layer, as well as any external information attached through it, when analyzing these texts.

Although the technology surrounding the use of topic maps is still in its infancy, it is expected that by the time this paper will be presented, there will be browsers of topic maps, that allow accessing the encoded material directly through the maps. This will allow researchers to formulate and explore questions concerning the material in a way that is much closer to the needs of a researcher than is the case with current information retrieval technology. Possible questions with meaningful answers will include the following:

Can the development of topoi in these texts seen in the context of lineage affiliation?
Are there anecdotes with similar content attributed to masters in different lines?
Is the lineage affiliation interrelated strongly with the area of origin of a master?
Are lineages organized in geographic patterns?
Depending on the availability of sufficiently fine-grained data in the texts, and of course bearing in mind that the picture presented in the text is not necessary of historical accuracy, some of the following questions could possible also be traced in such a topic map browser:

Which masters have been at a given temple at a given time?
Could two masters (who are reported to have met) actually have met? If yes, where?
What other masters did a given master encounter? At which time in his life?
There are of course an unlimited number of questions that could be posed and, hopefully, meaningfully answered with such a browser.

Topic maps provide a new and efficient way to access the information in encoded texts. The research project described in this paper tried to use topic maps to overlay the texts with a layer of pointers and topic descriptions. Preliminary results have been encouraging and a great potential is seen in further research in this area.

It will be particularly interesting to see, how a hierarchical layer of topic maps could be build, which would allow to further tune the outlook on the available information by switching only to the (specialized) topic maps needed for a specific question. It would also be interesting to see, how a system could use the information provided by users of the topic map browser to add additional topics, relationships and occurrences on the fly. In particular, this would allow a functionality similar, but much more powerful, to the collaborative annotations promoted by the World Wide Web Consortium4.


1. The Association for Computers and the Humanities (ACH), The Association for Computational Linguistics (ACL) and The Association for Literary and Linguistic Computing (ALLC) Guidelines for Electronic Text Encoding and Interchange, edited by C. M. Sperberg-McQueen and Lou Burnard, TEI P3 Text Encoding Initiative Chicago, Oxford, May 16, 1994.

2. International Organization for Standardization, ISO/IEC 13250, Information technology - SGML Applications - Topic Maps Geneva, 2000

3. Formally, the above mentioned standard ISO 13250 is developed in the family of SGML standards. Work is currently on the way, to translate the model to the XML family, which will allow for better integration with XML technologies like XLink, XPointer and XML namespaces, that are not available in SGML. A first specification of XML Topic Maps (XTM) is expected in the near future.

4. See this page on the World Wide Web Consortium's web server for an overview of the technology, which includes the use of RDF, XLink and XPointer to overlay annotations to Web pages.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at New York University

New York, NY, United States

July 13, 2001 - July 16, 2001

94 works by 167 authors indexed

Series: ACH/ICCH (21), ALLC/EADH (28), ACH/ALLC (13)

Organizers: ACH, ALLC

  • Keywords: None
  • Language: English
  • Topics: None