The Harvester of Iconclass Metadata: a web service for subject classification and subject retrieval in cultural heritage information systems

poster / demo / art installation
Authorship
  1. 1. Hans Brandhorst

    arkyves

  2. 2. Etienne Posthumus

    Independent Software Developer

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Summary The importance of controlled vocabularies for the
cataloguing of literary and visual sources by museums,
libraries, and archives can hardly be exaggerated.
Important organizations like the Library of Congress,
the Getty Foundation, and OCLC have been promoting
standardization by making thesauri and classification
systems available online. However, complying to such
a standard often means transferring descriptors from the
online vocabulary system to another application, thereby
isolating them from their context. While this is annoying
during the production phase, it is fatal at retrieval time,
because an end user should have access to the context
of a descriptor while querying a catalogue. This poster
demonstrates a web service for Iconclass, a multilingual
subject classification
system for cultural content that is
used from Finland to Italy and from Germany to the US,
which solves this problem. The service makes the Iconclass
system available as an ‘add on’ to database
management
systems and online electronic catalogues. The
Iconclass metadata harvester uses OAI-PMH to gather
Iconclass codes, supporting special retrieval browsers
within local websites, while creating a single access
point for thematic searches across multiple online databases.
1 The authority paradox
When using an online classification, a thesaurus or some
other form of controlled vocabulary for cataloguing, we
are sooner or later confronted by what could be labelled
the authority paradox. The core of this paradox is that
using a terminology authority in essence means selecting
specific terms from the authority system and copying
them to records of a local catalogue. However, the
instruments
that are available to cataloguers to select the
most appropriate descriptors from the authority system,
are not automatically available to the end user of a catalogue.
By definition vocabularies like classifications or thesauri
—or, perhaps more fashionably, ontologies—are systems
with a structure. This structure may be more or less
complex, but at the very least there will be some basic inner
organization. An example of such an organizational
structure is that of the broader and narrower terms of
a hierarchy. Simple hierarchical subordination is only
one of various techniques to help a user find descriptors.
Keywords may be added to help the user find the
most appropriate concept; cross references may connect
related concepts; redirects may point from non-preferred
to preferred terms; scope notes explain the intention of a
term, while links connect concepts across the languages
of a multilingual system... these are just a few examples
of additional,
often quite sophisticated instruments to
help one make the most of the authority system. No matter
how simple or complex the structure of the authority
system is, its default use will often be limited to the
transfer of the standardized term for an artist, a placename,
a profession, or an iconographical feature from
the system to the catalogue. It would obviously cause
an enormous amount of redundancy if one were also to
copy all of the parent terms in a hierarchy, let alone if
one were to transfer all the terms to which a chosen descriptor
may be related—if that were at all possible.
Based on the simple fact that it copies its descriptors
from ‘authority system X, Y or Z’, a catalogue may claim
to be compliant with that authority. However, this procedure of transferring terms, isolating
them from the structure
that embeds them, clashes with the purpose of complying
with an authority system. Or, at the very least, the
resulting catalogue will fail to offer to its end users the
features that turn authority systems into actual ‘systems’.
2 SKOS version of Iconclass as an external
thesaurus add on
How the web service of Iconclass—in a SKOS version
—functions as an add on may be efficiently demonstrated
with the help of the preceding illustration, a screenshot
of the Imdas database management
system which
offers Iconclass as an external thesaurus to its users.
Using a simple i-Frame and a local style sheet, Iconclass
is shown—here in its German form—as a tree of concepts
inside the Imdas application. Any browse action
refreshes the content of the tree by triggering a request
for updated information to the Iconclass server.
3 Restoring the link for retrieval: the
Harvester for Iconclass Metadata (HIM)
The previous illustration suggests how terms may be
transferred from the external authority system to the local
database at production time. It also suggests that the
transfer of the alphanumeric notation
that accompanies
every concept and assigns it a unique place in the hierarchy,
suffices to link a database record to a concept and its
complete context in the vocabulary system. It does not
illustrate
how the link between the individual terms that
are copied to the application database and the Iconclass
system is restored at retrieval time.
Before we look at how the Harvester for Iconclass Metadata
(HIM) service restores this link, we should list some
rather obvious assumptions:
A. The catalogue is available on the internet or at least
on an intranet that is linked to the internet.
B. All items in the catalogue are identifiable with the
help of some unique property, e.g. an inventory
number.
C. This unique property can be used to retrieve an item
from the catalogue.
Needless to say that the assumption about the Iconclass
system is that this is a web service, permanently
available
on the internet. It may be consulted by human cataloguers,
but it is also available for information exchange
between computers.
Back to the question of how to restore the link between
the copied terms in an application database and the Iconclass
system’s server. Actually, the answer is quite simple.
Although it is theoretically possible to enrich a catalogue
by absorbing major parts—or even the whole—of
the Iconclass system,
by far the easier strategy is to reverse
the procedure, limit the information stored in the
local database to Iconclass notations and then enrich the
Iconclass system with information about the catalogue.
What makes it so easy to export information about a
catalogue that uses Iconclass for its subject access, is
the simple fact that Iconclass is a classification system.
Therefore, every concept in the system, with all of its
links to other concepts and its translations, corresponds
to a single code, or ‘notation’. Like barcodes or ISBNnumbers
these notations are thus very concise containers
of information.
These concise containers can easily be
harvested using the Open Archives Initiative’s Protocol
for Metadata Harvesting (OAI-PMH) or customized
variants thereof.
Although the Iconclass codes and the unique identifier
of the object (e.g. a catalogue item) to which they have
been assigned can be supplemented with other types of
metadata, the Iconclass codes and the identifier are the
most essential requirements for the harvesting service to
work.
The illustration above summarizes in a single picture the
essential elements of the service. What you see there is
the first row of a thumbnail gallery incorporated in the
French Emblems website at Glasgow University (http://
www.emblems.arts.gla.ac.uk/french/search.php). Above
the thumbnails you see the concept “song-birds: crow
25F32(CROW)” highlighted. Its broader terms are listed
above it. Below it, its narrower terms that were actually
used for this catalogue are also listed. Whenever a concept is clicked, the corresponding Iconclass notation is
sent to the central Iconclass server. The server then returns
the object identifiers to the local database. These
identifiers are subsequently used to retrieve the corresponding
objects from the local database. In the small
box at the top simple and complex keywords searches
may be entered in the various languages of Iconclass, i.e.
English, German, Italian, or French (partial translations
exist in Finnish and Dutch).
By providing Iconclass through a web service, all users
have access to the same, i.e. the present version of the
system. Editorial changes are instantly available. If a local
database merely stores codes, its users have access to
the various languages of the system, and the full context
of every concept—an efficient way to overcome the authority
paradox.
The aggregator website www.arkyves.org—a single access
to over 150,000 objects indexed with Iconclass—
will be shown as part of the poster, in addition to the
Iconclass web service.
Iconclass computing:
Arkyves
W.G. Plein 124
1054 SG Amsterdam
The Netherlands
tel +31 20 616 1039
e-mail: info@arkyves.org
website: http://www.arkyves.org
Iconclass content:
Rijksbureau Kunsthistorische Documentatie
PO Box 90418
2509 LK The Hague
The Netherlands
tel +31 70 33 39 777
fax +31 70 33 37 789
e-mail: iconclass@rkd.nl
website: http://www.rkd.nl
Appendix—reaction to reviewers’ comments
Review 1: A somewhat outdated description of the basic
idea of the Harvester of Iconclass Metadata can be found
at: http://mnemosyne.org/IIHIM/overview.rst.html
Although some details and names have changed, the
underlying principle is unchanged: classification codes
and a unique identifier for the item to which the codes
(desciptors) have been assigned are harvested and stored
in a central database. The codes are parsed and interpreted.
All of its implicit properties (textual definitions and
their translations, keywords, hierarchical links, cross references,
etcetera) are then extracted from the Iconclass
datafile and linked to the code. All search and browse
actions at the client’s website are then compared with
the information stored in the central database and results
are sent back to the client’s system. This procedure is
necessary for two reasons: A) technically Iconclass is a
complex system and it would be very expensive—and
redundant—to create search and browse software for
each client system; and B) for a single catalogue item
many descriptors (sometimes dozens of codes) may be
used simultaneously. The textual information and other
properties implicit in all codes that were assigned to a
single item have to be made available to the end user
simultaneously. The way to manage this efficiently is by
storing all information in a central database.
Due to special features that were designed prior to the
digital age, Iconclass is technically more complex than
most classifications, so if the software works for Iconclass—
which it does—it can in all likelihood cope with
other classification systems as well. The add-on is a proprietary
tool, but it is made available for free to any institution
which is prepared to share (part of) its (meta)data.
Review 2: Since there is only one online version of the
vocabulary all changes are made centrally. Most changes
will be additions of more specific concepts, so they
won’t affect existing documents. At most older documents
will not take full advantage of increasing specificity
in later versions of Iconclass, unless they re-edit certain
descriptions. Existing concepts have almost never
been withdrawn or given new meaning in the 35 years of
Iconclass’ usage history.
There is no room here to expand on the idiosyncracies
of Iconclass as a classification, but their presence is well
documented in earlier literature (in particular in special
issues of Visual Resources). The complexity of the
code structure sets Iconclass rather apart as a classification.
However, there are some parallels with biomedical
searching techniques (Collexis, fingerprinting, Knowlets,
WikiProfessional) that may be worth investigating.
It goes without saying that, financially, these tools are in
a very different league...
The ambition of HIM is not to be innovative per se, but
to offer a cost-effective solution to a problem—“how to
make the most of the use of Iconclass”—that would otherwise
require expensive software and the reinvention of
the wheel for every collection using the system. In the
world of the Humanities that in itself may be seen as an
innovation.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None