TEI and cultural heritage ontologies

paper
Authorship
  1. 1. Øyvind Eide

    University of Oslo

  2. 2. Christian-Emil Ore

    University of Oslo

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Since the mid 1990s there has been an increase in the interest
for the design and use of conceptual models (ontologies)
in humanities computing and library science, as well as in
knowledge engineering in general. There is also a wish to use
such models to enable information interchange. TEI has in its
20 years of history concentrated on the mark up of functional
aspects of texts and their parts. That is, a person name is
marked but linking to information about the real world person
denoted by that name was not in the main scope. The scope
of TEI has gradually broadened, however, to include more real
world information external to the text in question. The Master
project (Master 2001) is an early example of this change.
In TEI P5 a series of new elements for marking up real world
information are introduced and several such elements from
the P4 are adjusted. TEI P5 is meant to be a set of guidelines
for the encoding of a large variety of texts in many cultural
contexts. Thus the set of real world oriented elements in TEI
P5 should not formally be bound to a single ontology. The
ontological part of TEI P5 is, however, close connected to the
authors implicit world view. Thus we believe it is important
to study this part of TEI P5 with some well defi ned ontology
as a yardstick. Our long experience with memory institution
sector makes CIDOC CRM (Conceptual Reference Model) a
natural choice. CIDOC CRM (Crofts 2005) has been proven
useful as an intellectual aid in the formulation of the intended
scope of the elements in a new mark up schemes and we
believe the model can be useful to clarify the ontological
part of TEI. This will also clarify what is needed in order to
harmonize it with major standards like CIDOC CRM, FRBR,
EAD and CDWA Lite.
CIDOC CRM
CIDOC CRM is a formal ontology intended to facilitate the
integration, mediation and interchange of heterogeneous
cultural heritage information. It was developed by
interdisciplinary teams of experts, coming from fi elds such
as computer science, archaeology, museum documentation,
history of arts, natural history, library science, physics and
philosophy, under the aegis of the International Committee
for Documentation (CIDOC) of the International Council
of Museums (ICOM). The harmonisation of CIDOC CRM
and IFLA’s FRBR (FRBR 1998) is in the process of being
completed. The EAD has already been mapped to CIDOC
CRM (Theodoridou 2001).
CIDOC CRM is defi ned in an object oriented formalism
which allow for a compact defi nition with abstraction and
generalisation. The model is event centric, that is, actors, places
and objects are connected via events. CIDOC CRM is a core
ontology in the sense that the model does not have classes
for all particulars like for example the Art and Architecture
Thesaurus with thousands of concepts. CIDOC CRM has little
more than 80 classes and 130 properties. The most central
classes and properties for data interchange are shown below.
Example: The issue of a medieval charter can be modelled as
an activity connecting the issuer, witnesses, scribe and place
and time of issue. The content of the charter is modelled as
a conceptual object and the parchment as a physical thing. In
cases where it is necessary for a scholarly analysis and when
suffi cient information has been preserved, an issuing of a
charter can be broken down into a series of smaller events, e.g.,
the actual declaration in a court, the writing of the parchment
and the attachment of the seals. This conceptual analysis can
be can be used an intellectual aid in the formulation a data
model and implementation.
In 2005 the CRM was reformulated as a simple XML DTD,
called CRM-Core, to enable CRM compliant mark up of
multimedia metadata (Sinclair 2006). A CRM-Core XML
package may contain information about a single instance
of any class in the CRM and how it may be connected to
other objects via events and properties. The German Council
of Museum has based its standard for XML based museum
data interchange, MUSEUMDAT, on a combination of the
Getty standard CDWA Lite and CRM Core. The CDWA Lite
revision group currently considers these changes to CDWA
Lite in order to make it compatible with CRM.
TEI P5 ontology elements in the light
of CIDOC CRM
In TEI P5 the new ontologically oriented elements is introduced
in the module NamesPlaces described in chapter 13 Names,
Dates, People, and Places. There are additional elements
described in chapter 10 Manuscript Description, in the TEI
header and in connection with bibliographic descriptions as
well. In this paper we concentrate on the elements in chapter
13. The central elements in this module are: person, personGrp, org,
place and event. Person, personGrp and org are “elements which
provide information about people and their relationships”.
CIDOC CRM has the corresponding classes with a common
superclass E29 Actor.
The element event is defi ned as “contains data relating to any
kind of signifi cant event associated with a person, place, or
organization” and is similar to the CIDOC CRM class E5 Event
and its subclasses. In the discussion of the marriage example
in chapter 13, event element is presented as a “freestanding”
element. In the formal defi nition it is limited to person and
org. To make this coherent, the formal part will have to be
extended or the example have to be changed.
Still event is problematic. The marriage example demonstrates
that it is impossible to express the role a person has in an
event. Without knowing the English marriage formalism one
doesn’t know if the “best man” participated. The very generic
element persRel introduced in P5 does not solve this problem.
A possible solution to this problem would be to introduce
an EventStateLike model class with elements for roles and
participants.
The model classes orgStateLike, personStateLike, personTraitLik,
placeStateLike, placeTraitLike group elements used to mark
up characteristics of persons, organisations and places. The
elements in ...TraitLike model classes contain information about
permanent characteristics and the elements in ...StateLike
information about more temporal characteristics. The model
classes contain the generic Trait and State elements in addition to
specialised elements. The intention is to link all characteristics
relating to a person, organisation or place. It is not possible to
make a single mapping from these classes into CIDOC-CRM.
It will depend partly on which type of trait or strait is used,
and partly on the way in which it is used. Many characteristics
will correspond to persistent items like E55 Types, E3 String
and E41 Appellation, and are connected to actors and places
through the properties P1 is identifi ed, P2 has type and P2 has
note. Other elements like fl oruit, which is used to describe a
person’s active period, are temporal states corresponding to
the CIDOC-CRM temporal entity E3 Condition State. From an
ontological point of view the two elements state and trait can
be considered as generic mechanism for typed linking between
the major classes.
All the elements in ...TraitLike and ...StateLike model classes
can be supplied with the attributes notAfter and notBefore
defi ning the temporal extension of their validity. This is a very
powerful mechanism for expressing synoptically information
based on hidden extensive scholarly investigation about real
world events. As long as the justifi cation for the values in
these elements is not present, however, it is hard to map this
information into an event oriented conceptual model like the
CRM. Thus, it is important to include descriptions of methods
for such justifi cation in the guidelines, including examples.
TEI ontology – conclusion
The new elements in TEI P5 bring TEI a great step in the
direction of an event oriented model. Our use of CRM
Core as a yardstick has shown that small extensions to and
adjustments of the P5 elements will enable the expression of
CRM Core packages by TEI elements. This is a major change to
our previous suggestions (Ore 2006) in which the ontological
module was outside TEI.
To continue this research, an extended TEI tagset should be
developed with element for abstracts corresponding to the
ones in FRBR and CRM. This will not change the ontological
structure of TEI signifi cantly. But these adjustments will make
the ontological information in a TEI document compliant with
the other cultural heritage models like for example EAD,
FRBR/FRBRoo, CIDOC CRM and CDWA-Lite. There is an
ongoing harmonisation process between all these initiatives in
which it is important that TEI is a part.
Bibliography
Crofts, N., Doerr, M., Gill, T., Stead, S. and Stiff M. (eds.)
(2005): Defi nition of the CIDOC Conceptual Reference Model.
(June 2005). URL: http://cidoc.ics.forth.gr/docs/cidoc_crm_
version_4.2.doc (checked 2007-11-15)
CDWA Lite www.getty.edu/research/conducting_research/
standards/cdwa/cdwalite.html (checked 2007-11-25)
FRBR (1998). Functional Requirement for Bibliographic
Records. Final Report. International Federation of Library
Associations. URL: http://www.ifl a.org/VII/s13/frbr/frbr.pdf
(checked 2007-11-24)
MASTER (2001). “Manuscript Access through Standards for
Electronic Records (MASTER).” Cover Pages: Technology
Reports. URL: http://xml.coverpages.org/master.html
(checked 2007-11-25)
MUSEUMDAT (www.museumdat.org/, checked 2007-11-25)
Ore, Christian-Emil and Øyvind Eide (2006). “TEI, CIDOCCRM
and a Possible Interface between the Two.” P. 62-65 in
Digital Humanities 2006. Conference Abstracts. Paris, 2006.
Sinclair, Patrick & al.(2006). “The use of CRM Core in
Multimedia Annotation.” Proceedings of First International
Workshop on Semantic Web Annotations for Multimedia
(SWAMM 2006). URL: http://cidoc.ics.forth.gr/docs/paper16.
pdf (checked 2007-11-25)
TEI P5 (2007). Guidelines for Electronic Text Encoding
and Interchange. URL: http://www.tei-c.org/Guidelines/P5/
(checked 2007-11-15) Theodoridou, Maria and Martin Doerr (2001). Mapping of
the Encoded Archival Description DTD Element Set to the CIDOC
CRM, Technical Report FORTH-ICS/TR-289. URL: http://
cidoc.ics.forth.gr/docs/ead.pdf (checked 2007-11-25)

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2008

Hosted at University of Oulu

Oulu, Finland

June 25, 2008 - June 29, 2008

135 works by 231 authors indexed

Conference website: http://www.ekl.oulu.fi/dh2008/

Series: ADHO (3)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None