Department of Computing and Information Science - Queen's University
Forschungsinstitut Brenner-Archiv - Universität Innsbruck
Royal Institute of Technology, Stockholm
Royal Library, Stockholm
Computing Service - Oxford University
Library - Universität Innsbruck
Wittgenstein Archives - University of Bergen
Norwegian Computing Centre for the Humanities - University of Bergen
Dorset County Museum
Need
Libraries, archives, museums and other cultural
institutions hold vast deposits of unpublished,
non-printed primary sources. Parts of these materials contain extremely valuable documentation of
major general public and cultural as well as academic interest.
The situation today is that the general public often
has no access to certain material at all and no
knowledge of its existence, while in fortunate
cases devoted scholars have access to microform
reproductions of at best variable quality, and only
researchers with special permission may have access to the original documents. This situation applies to older documents dating from pre-Gutenberg times as well as to a very large proportion of
material dating from modern times.
NOLA is a European project designed to improve
access to material held in literary archives – the
unpublished sources on major novelists, philosophers, musicians and painters. Many of the specialized institutions holding these collections have
more experience in the active acquisition, documentation, transcription, research on and publication of such sources than most general libraries.
NOLA will establish a common platform for cooperation on the use of standards, tools and methods as well as the provision of mutual access to
resources between libraries and other institutions.
The partners in the consortium are: the Royal
Institute of Technology (Stockholm), the Royal
Library (Stockholm), Oxford University Computing Services, the Dorset County Museum, the
University of Innsbruck Library, Forschungsinstitut Brenner-Archiv (University of Innsbruck), and
the Norwegian Computing Centre for the Humanities in collaboration with the Wittgenstein Archives (University of Bergen).
Libraries and other institutions in this broadly
defined sector are aware of the potential benefits
of electronic information technology, and some
institutions have started to exploit these. However,
scarcity of resources, lack of inter-operability
across languages, hardware and software platforms, has prevented large-scale network access.
Current systems do not provide the specialist tools
needed by the users, or require resources and
competence beyond those available to individual
small and medium-size institutions. The latter is a
particularly important limitation, since many of
the relevant institutions are small.
Common bibliographic cataloguing standards are
necessary for easy resource discovery via electronic networks. The mutual incompatibility of existing cataloguing schemes (almost every European country has its own flavour of MARC, and in
the German-speaking world MARC is generally
not used at all), and the lack of expressiveness of
current cataloguing standards for archival materials is are difficulties just as serious as the technical
problems involved.
NOLA will build upon the work already carried
out by the Text Encoding Initiative (TEI) in defining Guidelines for the encoding and interchange
of a wide range of machine-readable resources.
The TEI is the most successful attempt so far made
to determine a comprehensive set of encoding
standards, based on ISO standard SGML, which
are of truly general applicability in realistically
scaled projects. It defines a common, systems-independent, non-propriety format for representation and interchange of documents.
The TEI recommendations for encoding and metadescription of manuscript materials will be the
particular focus of NOLA, which will assess their
suitability and recommend their extension and
modification as necessary. NOLA will build an
environment in which common Guidelines for
standard encoding practices and for project management will be integrated with appropriately tailored software tools.
The project will thus not only define a common
format for encoding and interchange of documents, but also provide libraries with the tools
necessary to prepare and exploit documents in this
format, and advice concerning human, organizational and economic changes necessitated and facilitated by the employment of the proposed guidelines, tools and procedures.
Background
The Text Encoding Initiative (TEI) is an international project established in 1988 to develop guidelines for the preparation and interchange of electronic texts, initially aimed at the needs of
scholarly and industrial research in particular, but
with a broad range of uses by the language industries more generally. Its recommendations, the
TEI Guidelines, formally specify a detailed and
extensible encoding system based on SGML, the
Standard Generalized Markup Language (ISO
8879).
The TEI was established in response to the pressing need for a common text encoding scheme to
counter the chaotic diversity of formats in use in
the mid-1980s. Since then, the need for standardized encoding practices has become even more
critical as the need to use and, most importantly,
reuse vast amounts of electronic text has dramatically increased for both research and industry.
Members of NOLA have been actively involved
in the development of the TEI Guidelines. NOLA
will place major emphasis on development of
procedures for the integration and use of TEI-aware tools for the creation, management, documentation, analysis and dissemination of archival resources. Such tools are already being developed
by several of the participants, and other TEI users.
NOLA will contribute to this effort, but its main
object will be to assess their usability and performance in real life conditions, in order to facilitate
a comparative cost-benefit analysis of the organizational and economic aspects of this approach to
the problems of networked resource description
and resource provision.
The explosive growth, in the last few years, in the
use of the World Wide Web (WWW) system built
around the HyperText Transport Protocol (HTTP)
and the HyperText Markup Language (HTML)
demonstrates both the potential and the difficulty
of bringing high quality telematics applications to
a broad public. The pool of potentially interested
users is immense, as is the range of applications.
As the usage of the World Wide Web grows,
however, the limitations of its design become clearer. Like the TEI, HTML is an application of
SGML. Unlike the TEI, however, HTML is a rigid
and simplistic markup language designed solely to
simplify the creation of tagged text and its display
on a screen. HTML handles the tagging and display of simple monolingual prose fairly well, but
has no support for research needs (e.g. linguistic
annotation), the needs of large collections of texts
(e.g. unusual text types different structurally from
conventional modern technical prose), or the
needs of bibliographic control and network-based
information discovery and retrieval (e.g. the provision of catalogue-like information about the
contents of an information resource). As a result,
incompatible extensions of HTML are springing
up everywhere; the resulting confusion is undoing
much of the apparent advantage of simplicity offered by HTML in the first place.
In the long run, the needs of an information society
can only be met with applications requiring a far
richer markup language than HTML. Ideally, such
a markup language would be rich enough to handle
many different kinds of text, and many different
kinds of application; it would be well documented,
so as to make it easier for different applications to
interoperate on the same data; it would be flexible
enough to handle simple tasks simply, but to scale
up and provide the structure needed to handle
complex tasks reliably. Such goals cannot be met
merely by allowing the information networks to
carry unconstrained SGML: they require the use
of a common markup language where possible, to
ensure that data can be used by many different
applications. The TEI is the best candidate for
such a general purpose mark-up language yet to
have appeared.
Technical aspects
The Standard Generalized Markup Language
(SGML) is a widely-used international standard
(ISO 8879) for the definition of markup languages. A markup language provides a fundamental
vocabulary for identifying different segments of a
text as being textual objects of a particular kind,
with particular attributes, and defining their syntax
formally.
Using SGML, the TEI constructed a modular system, enabling the user to combine parts in ways
appropriate to the needs of the specific application
at hand. This flexibility of combination, and the
accompanying ease of modifying the TEI scheme,
are seldom found in other SGML document type
definitions. For purposes of documentation and
user extension, the TEI also developed a simple
object-oriented class and inheritance system,
which greatly simplifies design and maintenance
of the markup language.
Items pointed out in the current TEI Guidelines for
further work are: "the encoding of physical description of textual witnesses, the materials of the
carrier, the medium of the inscribing implement,
the layout of the inscription upon the material, the
organization of the carrier materials themselves
(as quiring, collation, etc.), authorial instructions
or scribal markup, etc." (P3, p 557)
There are also some general difficulties combining structural and graphical information essential
29
to document types typical of literary archives, and
the difficulty in handling overlapping and discontinuous elements within restrictions posed by
SGML syntax have made a number of TEI mechanisms quite complicated.
NOLA will extend the existing TEI Guidelines on
the basis of experience gathered in projects in
which it is being used by producing an extended
tag set to handle a number of features of particular
interest to archives but not fully covered in the
existing Guidelines.
Benefits
The TEI emerged from a trans-national community of scholars and researchers, with a common
interest in facilitating the exchange of data and
electronic resources. Its working groups were
drawn from experts in several different countries,
dedicated to identifying a consensus of acceptable
practices which could be tailored to the needs of a
culturally and linguistically diverse body of users.
This project is a continuation of that work.
European policies concerning the free exchange of
information and the community-wide interchange
of reusable resources are crucially dependent on
the widespread acceptance of standardization efforts at national and international levels. As a
pre-normative standard, based on user-defined
practice, and user-driven in its creation, the TEI
stands a better chance of achieving that acceptability than some others, which risk being perceived as commercially or externally motivated. There is already ample evidence that both commercial
and academic players throughout the European
marketplace perceive the TEI as offering exactly
the kind of independent authority for the definition
of interchange formats which is currently lacking.
TEI-based standards have already been adopted at
both national and European levels in a variety of
Linguist Engineering projects (e.g. the British National Corpus, EAGLES, MULTEXT). Within the
library community there is also keen interest in the
TEI.
Although based initially in the academic sector,
and with a strong emphasis on language engineering, the TEI has necessarily addressed issues of
importance to the commercial sector and to wider
information engineering concerns, without compromising its independence of specific interest
groups. The modularity of design and extensibility
of its proposals, originally motivated by the unpredictable needs of the research community, are of
equal importance to the needs of a highly competitive and rapidly changing information industry.
In essence, the TEI offers a reliable means of
delivering and documenting electronic resources
which is independent of any particular delivery
method or application, but which can be tailored
to suit any such application without loss of information.
By acting at a European rather than a national or
local level, the TEI facilitates the emergence of
standard practices for the management and description of information resources. Such practices
are clearly essential to the free flow of information
across national borders. At the same time, the
sophistication and flexibility of the TEI scheme
ensures that such standard practices are not achieved at the cost of sacrificing the inherent richness
and diversity of European cultural resources.
NOLA focuses on a number of specific tasks
which will ensure the widest possible dissemination and understanding of the TEI proposals within the general area of small- and medium-sized
libraries, museums, archives and other institutions
preserving important parts of Europe’s cultural
heritage.
REFERENCES
Gerhard Renner: Die Nachlässe in den Bibliotheken und Museen der Republik Österreich
[=Verzeichnis der schriftlichen Nachlässe in
den Bibliotheken und Museen der Republik
Österreich, Band I) Wien, 1993]
Tilo Brandis & Ludwig Denecke: Die Nachlaesse
in den Bibliotheken der Bundesrepublik
Deutschland. Bearbeitet von Ludwig Denecke.
Zweite Auflage, völlig neu bearbeitet von Tilo
Brandis [=Verzeichnis der schriftlichen Nachlässe in deutschen Archiven und Bibliotheken,
Bd. 2) Boppard am Rhein, 1981]
C.M. Sperberg-McQueen and Lou Burnard (eds.):
"Guidelines for the Encoding and Interchange
of Machine-Readable Texts (TEI P3)", Chicago and Oxford April 1994
. Information Processing – Text and Office Systems Standard Generalized Markup Language (SGML)", International Organization for
Standardization, ISO 8879-1986
Claus Huitfeldt: "MECS – A Multi-Element Code
System", forthcoming in Working Papers from
the Wittgenstein Archives at the University of
Bergen, 1995
. MECS-WIT – A Registration Standard for the
Wittgenstein Archives at the University of
Bergen", forthcoming in Working Papers from
the Wittgenstein Archives at the University of
Bergen, 1995
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at University of Bergen
Bergen, Norway
June 25, 1996 - June 29, 1996
147 works by 190 authors indexed
Scott Weingart has print abstract book that needs to be scanned; certain abstracts also available on dh-abstracts github page. (https://github.com/ADHO/dh-abstracts/tree/master/data)
Conference website: https://web.archive.org/web/19990224202037/www.hd.uib.no/allc-ach96.html