New life for old reports - The Archaeological Part of the National Documentation Project of Norway

The National Documentation Project of Norway
is a nation-wide project applying modern computer technology to the collections departments (archaeology, lexicography, folklore et cetera) at the
Norwegian Universities (Ore 1994). The archaeological sub project has started in Oslo, Bergen and
Tromsø. In the archaeological sub project the activity is concentrated on the following two subjects: The conversion of existing paper based files
to computer readable format, and the development
and implementation of computer based methods
to support archaeological fieldwork. In this paper
we discuss the creation of an incremental information system for the archaeologists. We will focus
on how to integrate the information found in the
annual acquisition reports of the National Museum of Antiquities in Oslo published over the last
165 years with the information found in the modern data bases into one consistent database.
An overall data base model for the Norwegian
Archaeological Museums has been worked out by
studying the activity in the museums (Holmen &
Uleberg 1995), see also (Rold 1993) for a description of an analog process at the Danish Nation
Museum. Thus it is now possible to detail each
part while keeping a coherent system. The archaeological data base consists of several smaller
data bases, containing information like artefact
descriptions, conservation work reports, ancient
monuments descriptions and so forth. The archaeological data base will subsequently be connected to databases from the other part projects
like numismatics, runes, place names et cetera.
Today the conversion of the artefact catalogues at
the National Museum of Antiquities in Oslo and
at the University Museum of Bergen have been
completed, and connections between this material
and the data base for the Sites and Monuments of
Southern Norway have been implemented.
Free text and relational data bases
It is preferable to be able to combine precise
information, as we would expect to find it in a
relational database, with the eloquence one might
find in a free text. In the Documentation project
Standard General Mark-up Language (SGML)
and as far as possible, TEI conformant encoding
schemes, are used to encode the texts. SGML
improves the free text search routines, and may
also be used to create links between a relational
database and texts stored in a free text system.
While creating a relational data base is putting text
into tables, SGML encoding is putting a table on
a text. One of the challenges when working with
SGML, is finding the structure inherent in the text.
When a text is well structured, it is possible to
create a tight Document Type Definition (DTD),
which gives very precise information about the
text, and which gives possibilities for well structured searches in the text.
The acquisition reports
The National Museum of Antiquities was founded
in 1829. Over the last 165 years the way of phrasing an artefact description has changed quite a lot,
as have the orthography of the Norwegian language. It is interesting to see how the different theories
guide the artefact descriptions. The Komsa artefacts are stone age finds from Finnmark. In the
1920’s, the dominating idea around these artefacts
was that they were palaeolithic. Accordingly, their
description follows the description of French Palaeolithic artefacts, with French artefact names, in
contrast to all other objects which were given
Norwegian category names.
The fact that the reports have been written by a
number of people over a long period of time, has
made it a challenge to reach a SGML-system that
can incorporate all texts. One has the choice between following a system that can fully follow all
text variations, and a system that has a tight structure. One has to find a stand in between, where the
texts can be incorporated in a satisfactory way, and
where the system is still rigid enough to be called
a system. We ended up with a single DTD allowing a free order of the chunks of information in
each catalogue entry. A chunk of information is
typically a few words describing the artefact or the
place of finding, the place name, county and so on.
The tag set describes what kinds of information
which are found in the catalogues. During conversion and the encoding phase no attempt was made
to reclassify or to modernize the nomenclature or
even the language (varying from Danish in the
19th century to modern Norwegian).
The encoded texts are stored in a free text system
(PAT from OpenText inc.). The mark-up has made
it possible to import the key information into a
relational data base (Oracle).
The old reports and modern archaeology
Concerning the naming conventions of the different artefact types there have been several attempts
to find coherent solutions that all archaeologists
should be able to agree upon . Projects have often
spent too much of their initial energy on a discussion of such conventions. In the Documentation
Project, a choice has been made to strive for a
historicity in the bases, in the following sense: The
artefact descriptions as found in the original reports and files are imported into the data base
without any changes. This information constitutes
the foundation of the information in the data base.
Possible reclassifications and other information
about artefacts and monuments are added to the
data base, but the original information is never
deleted. In this way an incremental data base is
created, where every event concerning an artefact
or an ancient monument adds a new layer of
information about the actual item.
To make the search easier the modern terms will
be added in the texts and in the data base, but no
part of the original texts will be deleted. A user
will then be able to use the modern term, and also
get information about the artefacts which have
been catalogued and described at a time when the
older terms where in use.
The addition of the modern nomenclature will
solve the problems where there is a complete
coherence between old and new terms. It will not
cover all cases where artefacts which earlier belonged to the same group is now regarded as
different types. This, however, is not a question
that can be solved once and for all. Since this base
is incremental, each new researcher adding point
of view, it will be possible to know what kind of
classifications are preferred by the different scholars. Since 1989, a database for artefacts from the
Bronze and Iron ages has been built up in connection with a transfer of objects to new storerooms.
From the need of naming the artefacts in the base
and at the same time stressing the use of precise
and exclusive terminology, a nomenclature has
been developed (Von der Fehr 1992). This nomenclature will be used as a starting point for some
searches in the overall base.
We have chosen to analyze the information structure of the acquisition reports to reveal the information categories and their interrelationships. Before the convertion we made no attempts to make
any thesaurus or encoding scheme of the actual
terms. Our encoders (mainly women participating
in work training schemes for unemployed) were
trained to encode pieces of texts like artefacts,
material, location descrition and so forth. We believe that only by postponing the categorisation
debate, we have been able to complete the conversion. As a side effect of the conversion and encoding there now exists a complete list of terms used
in nNorwegian archaeological reports. The list of
terms are indexed by year, place, the museum
identification number and so forth and should be
of interest to all interested in the archaeology in
