Digital Archives

  1. 1. Stefan Aumann

    Max Planck Institut fur Geschichte

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Recent discussions on the possibilities to store
digital manuscript material have most oftenly focused on the possibility to produce high quality
representations of a rather restricted amount of
digitized source material. In the archival world, on
the other hand, digital systems have frequently
been designed with the understanding that the
digital storage of bulk material is primarily a replacement of the classical microfilming operations of archives.
Using a German project, which intends to create a
pilot “edition” of a serial source of ca. 50,000
pages, this paper discusses how far archival systems can provide a starting ground for an incomparably more intensive access to bulk material
than traditional techniques.
The presentation will start with a short review of
the existing access methods for digital archives. It
is well known that while the scanning campaign
of a digitization project represents a serious organizational task, the provision of the various
access tools which allow a user to access the
digitized material, actually requires a considerably
larger effort.
Let us recapitulate what the purpose of these access mechanisms is. The user of a digital facsimile
or edition should have the possibility to select
those pages he or she wants to look at by specifying characteristics of the text contained on the
individual pages. The user of a digital archive
should have the possibility to access by similar
means all parts of the archival holdings which
interest him or her in the form of high quality
reproductions right at the desk in the user’s room
in the archive.
Traditionally this is done with the help of either
full text retrieval systems or structured databases
which contain descriptions of the material, which
makes their preparation rather time consuming.
Three forms of access can be differentiated between.
1) Access by Browsing
The user encounters the manuscript(s) as a – potentially structured – collection of pages. (S)he
pages through the material in the order imposed
by the structure holding the documents.
This is the only traditional access tool which can
be realized speedily. More popularly speaking:
you go to the traditional catalogue of the archive,
look up the shelf mark, enter it into the computer
(or select it there from a list) and get the first page
of the relevant document onto your screen.
2) Access by Query
The user specifies a query in the query language
of an underlying database system. This query addresses formal descriptions – which can contain
partial or complete transcriptions – of the docu256
ment. As a result the user is presented with an
ordered collection of qualifying pages.
Less technically: you save the excursion to the
catalogue, which is itself administered by the computer as a database in which you can employ
traditional database tools. The problem with such
an approach, as mentioned before, is that it is
usually a very complex operation to convert a
traditionally very flexible and highly irregular archival catalogue into a rigidly structured database.
3) Access by content
Partial or complete transcriptions are loaded into
a fulltext system, presenting the complete vocabulary of some holdings as an “active list”. By dynamically specifying the formulae needed, the selection is narrowed down to a manageable number of
documents, which are then displayed.
Because of the heterogenous nature of traditional
archival tools, such a conversion is usually easier
to accomplish than the creation of a rigidly structured database. This idea to create a computer
based access tool directly out of an existing one
leads us one step further, to:
4) Access by Digitized Versions of Traditional
An existing catalogue or findbook is digitized
itself. The digital version of this tool can be accessed by any of the access methods described so far.
“Activating” an entry of the digitized tool intializes the display of the page(s) described by it.
Less formally: you search within a graphic reproduction of the old catalogue on the screen and click
on a specific entry within it to see the first page of
the file described by that entry.
This notion of using a visual object as an access
tool for other visible objects leads directly to:
5) Access by a Graphic Overview
The organizational scheme representing the order
of the collection – for example a map of a community or territory – is presented as user interface. By
activating a “house” or “location” on the map, the
related documents are displayed.
More intuitively: you click on a map to start
browsing through all the documents related to the
village clicked on. While this is more intuitive, it
can be shown however, that for actual access to
information within real-size historical territories,
the popular “clickable” map of toy applications
may need some rethinking to reach an acceptable
information density on the screen.
6) Access by Fragment
Significant sections of the manuscript – for example illuminated initials or miniatures – are administered as a primary database. By activating such
a fragment the part of the complete manuscript
from which it is taken is displayed.
Few experiences with this kind of approach exist
yet; it remains to be seen whether such a tool
which has been used experimentally within the
realm of digital facsimiles can successfully be
extended to large scale digital archives.
After having shown examples of the basic access
mechanism, we go on to demonstrate, that the
actual software functionality required to implement these techniques is very closely related to the
functionality which has been implied by Dino
Buzzetti’s discussion of variant readings.
By this we assume to have demonstrated, that the
various possibilities to use digitized manuscript
material are closely related to each other: which
supports the thesis, that the appropriate response
of archival institutions to the new technologies
should primarily be in the creation of an institutional framework, which is sufficiently flexible to
allow one and the same institution to act as a
logistical host for a few groups of manuscripts
with very intensive editorial information assigned
to them, while acting at the same time as supplier
of very shallowly described mass documents.
This may seem doubtful for one reason: documents, into which extremely intensive editorial
preparation has been invested create different problems of copyright and protection against illegitimate distribution than mass documents with few,
if any, explanatory information attached to each
individual page.
We close our considerations on digital archives
therefore with a discussion of the protection mechanisms employed within the organizational and
software environment from which the examples of
this paper are drawn. Data security in the case of
archives arises broadly from three reasons.
a) The institutions from which the source material
originates have been awakened recently to the
problems of copyright with regard to digitized
source material.
Museums are afraid that they will be robbed of
large revenues if cheaper pictorial reproductions
of their holdings, and particularly reproductions,
which can easily be copied, get around. This is not
quite so obvious in the archival case, but certainly
represents a reason for much concern for an author
of a digital facsimile or edition.
b) While nobody in a small city archive really
believes that they will loose huge sums because
their early 16th century account books can easily
be copied, there is a widespread fear in the archival
world that the systematic digitization of source
material will threaten the position of the archives
in two ways. On the one hand there is a widespread
feeling that these technologies will let the archives
lose control over their material. There is probably
no technical answer to that: it is part of the impli257
cations these technologies have for the organisation of the research process. A more immediate
fear, particularly in smaller archival institutions,
is related to the fact, however, that many archives
get funded among other reasons because the local
authorities get convinced of the importance of an
institution which has so and so many users a year.
This effect, it is feared, will get lost when large
portions of the archival holdings are accessible
from the outside. c) A third problem arises with
sensitive material, as, for example, in the case of
an attempt to convert the holdings of the archive
at the former concentration camp in Auschwitz
into digital form. While the manipulation of high
quality images is not quite as easy as that of low
quality reproductions on which it is usually demonstrated, in the case quoted the danger of falsifications produced by some right-wing lunatics to
prove the non-existence of the holocaust is quite
Within the various projects implicitly discussed
here, we have not yet found any definite solutions
to these problems. However in general the following procedures will probably be implemented.
To protect the rights of the institution generating
the material, it will be distributed in an internal
format, which can only be accessed with a specific
copy of the program issued with it; which should
solve the problems described under a) and b).
In that area we assume that any protection scheme
can only protect as long as no serious criminal
attempt is made to break it. (If you want to produce
a non-copyrighted version of a fairly traditional
publication, you can do so just as well.) In the last
case, however, where historical integrity is in question, and the potential offenders have a clear
criminal potential, this is deemed insufficient. In
principle it will always be possible to display
visual material on a computer and dump a copy of
the screen into a file, where it can then be processed further. While it requires quite some effort to
recreate out of such dumps the original quality, it
could in principle be done. The distribution of the
material is not the problem in a case like Auschwitz: the more people see the authentic sources
about the holocaust, the better. It has to be possible, however, to prove easily that a specific visually reproduced document has not been tampered
with. For such purposes digital reproductions of
images or manuscripts can contain embedded
“watermarks” or “seals” which are as difficult to
break as the identification codes for credit cards
and similar devices.
The presentation concludes by an attempt to show
briefly, how these mechanisms for the protection
of manuscript security fit into the overall logic of
manuscript processing, which is supposed to be
the covering theme of this session.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at University of Bergen

Bergen, Norway

June 25, 1996 - June 29, 1996

147 works by 190 authors indexed

Scott Weingart has print abstract book that needs to be scanned; certain abstracts also available on dh-abstracts github page. (

Conference website:

Series: ACH/ICCH (16), ALLC/EADH (23), ACH/ALLC (8)

Organizers: ACH, ALLC