Integrating Images and Text with Common Data and Metadata Standards in the Archimedes Palimpsest

  1. 1. Doug Emery

    Emery IT

  2. 2. Michael B. Toth

    R.B. Toth Associates, Walters Art Museum

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper by the Archimedes Palimpsest program
and data managers will discuss the issues, challenges
and decisions related to the integration of all images
and transcriptions of the Archimedes Palimpsest in
digital form. This will include discussion of the management
of image and transcription data, adoption of metadata
and text encoding standards and schemas, and challenges
faced in integrating these data during this 10-year
long program and use in follow-on efforts.
This paper will address the integration of scientific and
scholarly data through the application of best practices
in standardized metadata to images and XML transcriptions,
and challenges encountered in applying the various
imaging, identification and encoding standards. This
will include discussion of the following:
1. The effective implementation of broadly accepted
metadata models and data architectures;
2. Integration of integrated data standards, including
Dublin Core and Text Encoding Initiative; and
3. Embedding metadata elements within the data itself
for effective preservation and archiving, as well as
spatial linkage of data elements.
On 22 October, 1998, the Archimedes Palimpsest was
sold at auction for $2.0 million to an anonymous buyer.
A multidisciplinary team of conservators, imaging scientists,
scholars, and information technology professionals
disbound, conserved, digitally imaged, analyzed
and transcribed the 184 parchment pages for continued
study. The program applied advanced spectral imaging
to study the Archimedes and other significant medieval
manuscripts from the 10th century that were copied over
by 13th-century prayer book text. On October 29, 2008,
the approximately 1 Terabyte of Archimedes Palimpsest
integrated image and transcription data were released
to the public for free use. Integrating the ancient Greek
transcriptions of Archimedes’ mathematical texts with
digital images and hosting them on the Web for a broad
set of global users posed a complex set of information
sharing challenges.
The Archimedes Palimpsest Digital Product required
the integration of imaging, scholarly and data products.
The product incorporates registered images for each leaf
linked spatially to diplomatic transcriptions that scholars
initially created in various nonstandard formats, with
associated standardized metadata. Imaging scientists included
Dr. Roger Easton Jr. from the Rochester Institute
of Technology, Dr. Keith Knox from Boeing Corporation,
and Dr. Bill Christens-Barry from Equipoise Imaging,
and a camera provided by Stokes Imaging, supported
by John R. Stokes. The imaging effort built on imaging
of the Dead Sea Scrolls by a team from RIT and the British
National Gallery Vasari Project, while the transcription
encoding effort built on the work of the Homer Multitext
Project by the Center for Hellenic Studies. With
over 4,000 digital images in 12 spectral bands and 140
pages of transcriptions of the original writings in Greek
of Archimedes and Hyperides, standardized metadata
was critical to 1) access to and integration of images for
digital processing and enhancement, 2) management of
transcriptions from those images, and 3) linkage of the
images with the transcriptions.
This effort produced images and transcriptions of the
only copies of Archimedes treatises The Method and
Stomachion; the only copy in Greek of On Floating
Bodies; and copies of the Equilibrium of Planes, Spiral
Lines, The Measurement of the Circle, and Sphere
and Cylinder. It also discovered ten pages of text by the
fourth century B.C. Attic Greek orator Hyperides; six folios
from a still unidentified Neo-Platonic philosophical
text that may be commentaries on Aristotle; four folios
from a liturgical book; and twelve pages from two other
books, the text of which has yet to be deciphered. These
texts are being studied by scholars from a range of colleges
and universities, including Oxford, Cambridge,
Stanford, and Eötvös Loránd (Budapest) Universities.
These scholars bring not only the knowledge and ability
to read the sometimes almost illegible ancient Greek
text and diagrams, but also significant knowledge of the
science, mathematics, law and philosophy discussed in
the texts. Capturing this data from a range of scholars
and rendering it in a common, standardized digital format
required establishment of a program specification
for transcribed text.
Beginning in 2001, the Archimedes Palimpsest program adopted established metadata standards to ensure key
parameters were recorded during technical collection
for use in subsequent processing and studies, including
Dublin Core and Text Encoding Initiative guidelines.
These standards have been further refined and adapted
to address the needs of scholars, imaging scientists, conservation
and preservation professionals, and information
and data managers. Working with the scholars, the
program developed a Transcription Integration Plan for
the Archimedes Palimpsest Program that incorporated
the Unicode, Dublin Core and Text Encoding Initiative
Standards and Guidelines, which proved essential to the
integration of the transcribed information. The selection
of broadly accepted and up-to-date international consensus
standards is an effort to ensure currency, increase the
likelihood of long-term data viability, and provide for
ample documentation to describe the bit structure of all
archive components, from the core data to the supporting
files. The program also developed a system architecture
that was documented with archival, metadata and integration
plans and implemented after extensive review
and modifications.
The construction of the data set addresses the special
problem of building an archive for today and the distant
future. A guiding principle of the archive is the integration
of data and metadata components, following principles
described in the Consultative Committee for Space
Data Systems (CCSDS) Reference Model for an Open
Archival Information System (OAIS). Each image bears
embedded identifying, spatial, scientific, format, and
content metadata in its header. Each directory contains
all images for a given folio side, accompanying XMP
metadata files, checksum data, and spatially mapped TEI
XML transcriptions for the Archimedes and Hyperides
texts. Each image file or folio directory forms a selfcontained
unit of data and preservation information. The
directory as a whole provides files that guide users to
the data and document the data set and the technologies
comprised in it. A simple, documented archive structure
supports the discoverability and accessibility of the data.
The archive, the transcriptions, and supporting metadata
are designed to support the core image data using
broadly accepted standards. The key to the processing
and presentation of the Archimedes image data is
the registration of all the images of a single folio side
to one another. These relationships are documented in
and exploited by the supporting files and metadata. The
project-developed Archimedes Palimpsest Metadata
Standard (APMS) provides a metadata structure specifically
geared to relating all images of a folio side in
a single multi- or hyper-spectral data “cube.” It relates
the components of this cube to the imaged object, the
conditions and systems used in its imaging, the standards
and techniques used to generate the digital file, and finally
the standards used to document this components.
Each image has embedded in it its own metadata and so
may stand alone or be related to any or all of the other
members of the same cube. The standard is based on the
Dublin Core metadata elements and Federal Geographic
Data Committee’s (FGDC) Content Standard for Digital
Geospatial Metadata. The included transcriptions, written
in compliance with TEI release P5, support the images
and serve as a kind of metadata. For the majority, 142
of 180 folio sides, each is provided with a transcription.
In those transcriptions, each line is mapped to a rectangular
region of the related images. The TEI <facsimile>
element and its children are used for this purpose.
These digital transcriptions provide a machine-readable
tool that document the content of the images. The spatial
mapping allows easy mapping from transcription to image
and vice versa.
The use of standardized data sets allows the hosting and
integration of image and textual data, as well as data
from other cultural works, across a range of services providers,
libraries and cultural and educational institutions,
and the separate development of graphical user interfaces
(GUIs) by users as needed. Inclusion of standards
as part of the data set will help ensure the data will be
readily searchable, available and accessible for studies
in decades to come.
Additional information on this program is available at
the Archimedes Palimpsest website
Dublin Core Metadata Initiative, Dublin Core Metadata
Element Set, Version 1.1., 14 Jan., 2008.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None