Mining texts for image terms: the CLiMB project

  1. 1. Judith L. Klavans

    University of Maryland, College Park

  2. 2. Carolyn Sheffield

    University of Maryland, College Park

  3. 3. Eileen Abels

    Drexel University

  4. 4. Jimmy Lin

    University of Maryland, College Park

  5. 5. Rebecca Passonneau

    Columbia University

  6. 6. Dagobert Soergel

    University of Maryland, College Park

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The CLiMB (Computational Linguistics for Metadata
Building) project addresses the existing gap in subject
metadata for images, particularly for the domains
of art history, architecture, and landscape architecture.
Within each of these domains, image collections are
increasingly available online yet subject access points
for these images remain minimal. In an observational
study with six image catalogers, we found that typically
1 – 8 subject terms are assigned, and that many legacy
records lack subject entries altogether. Studies on end
users’ image searching indicate that this level of subject
description is often insufficient. In a study of the imagesearching
behaviors of faculty and graduate students in
American history, Choi and Rasmussen 2003 found that
92% of the 38 participants in their study considered the
textual information associated with the images in the Library
of Congress’ American Memory Collection to be
inadequate. The number of subject descriptors assigned
to an image in this collection is comparable to what we
found in the exploratory CLiMB studies. Furthermore,
these searchers submitted more subject-oriented queries
than known-artist and title queries. Similar results demonstrating
the importance of subject retrieval have been
reported in other studies, including Keister, Collins, and
Chen 1994.
Under the hypothesis that searchers do not find images
they seek partly due to inadequate subject description in
metadata fields, the CLiMB project was initiated to address
this subject metadata gap by applying automatic
and semi-automatic techniques to the identification,
extraction, and thesaural linking of subject terms. The
CLiMB Toolkit processes text associated with an image
through natural language processing (NLP), categorization
using machine learning (ML), and disambiguation
techniques to identify, filter, and normalize high-quality
subject descriptors. Like Pastra et al. 2003 we use NLP
techniques and domain-specific ontologies, although our
focus is on associated texts such as art historical surveys
or curatorial essays rather than captions; unlike generic
image search, such as in Google, we analyze beyond
keywords and we use text which is specifically and
clearly related to an image. For this project, we use the
standard Cataloging Cultural Objects (CCO) definition
of subject metadata1 as including terms which provide
“an identification, description, or interpretation of what
is depicted in and by a work or image.”
In order to understand the cataloging process and to inform
our system design, we conducted studies on the
image cataloging workflow and the process of subject
term assignment. Our goal was to collect data on the
humanities-driven process as a whole to be able to incorporate
our results into an existing workflow and thus assist
a portion of the workflow with automatic techniques.
An additional purpose of studying the cataloging process
was to permit the development of system functionality,
i.e., the implementation of rules or the use of statistical
methods to identify high-quality subject descriptors in
scholarly texts. As part of the CLiMB evaluation, we
have established a series of test collections in the fields
of art history, architecture, and landscape architecture.
These three domains were selected in part because of the
existing overlap in domain-specific vocabulary. Testing
with distinct but related domains enables us to test
for disambiguation issues which arise in the context of
specialized vocabularies. For example, the Getty Art &
Architecture Thesaurus (AAT) provides many senses of
the term panel which apply to either the fine arts, architecture,
or both, depending on context. In the context of
fine arts, panel may refer to a small painting on wood
whereas in the context of architecture, panel may refer
to a distinct section of a wall, within a border or frame.
Figure 1 shows the CLiMB architecture which produces
subject term recommendations that can be used into the
image cataloging workflow observed in visual resource
centers: CLiMB combines new and pre-existing technologies
in a flexible, client-side architecture which has been
implemented in a downloadable toolkit and which can
be tailored to the user’s needs. In addition to matching
segments of texts to referenced images, we are developing
methods to categorize spans of text (e.g., sentences
or paragraphs) as to their semantic function relative to
the image. For example, a sentence might describe an
artist’s life events (e.g. “during his childhood”, “while
on her trip to Italy”, “at the death of his father”) or the
style of the work (“impressionism”). A set of seven
categories – Image Content, Interpretation, Implementation,
Historical Context, Biographical Information,
Significance, and Comparison – has been initially proposed
through textual analysis of art survey texts. These
categories have been tested through a series of labeling
experiments. Full details are available in Passonneau et
al. 2008. The output of this categorization will be incorporated
in future versions of the Toolkit, and will be used
as part of the disambiguation component.
An important contribution of the CLiMB project is the
development of a disambiguation component, enabling
the system to move beyond standard keyword-based indexing
by associating words and terms that have multiple
meanings which correspond to different descriptors with
the correct meaning in context. The ability to select one
sense from many is referred to as lexical disambiguation.
Results of our ongoing studies on sense disambiguation
using hierarchically structured faceted thesauri and lexical
resources, such as the Art and Architecture Thesaurus
and WordNet, will be presented. We have experimented
with the use of WordNet, with different levels of the facets
of the AAT, and with different degrees of filtering for
modifiers in noun phrases. We also have results on setting
weights for each of these factors to determine the
most accurate disambiguation techniques.
One of the most vexing problems in word sense disambiguation
is the fact that often several senses could be
considered correct within a given context. Therefore,
evaluation can be a challenge since there may be no
clear-cut right or wrong. The need for fuzzy evaluation
will be discussed in our presentation, with a demonstration
of different ways to measure precision and recall
against a “moving target” baseline.
Figure 2 shows the CLiMB interface in its current state
as of Fall 2008; as we use the results of our experimental
research, this interface may change as of the time of presentation
of the paper. Note in Figure Two that the collection under review is
found in the left panel, the image is in the center, the
analyzed text is shown to the cataloger, with the searchable
Getty thesaural resources (AAT, Thesaurus of Geographical
Names (TGN ) and Union List of Artist Names
(ULAN)) in the right panel. The cataloger can select
subject terms, and when possible, normalize according
to the Getty unique identifier. All interface panes are
flexible, and can be hidden or enlarged, as required by
the user. Cataloger subject term selections can be exported
in a range of formats (see Export button in upper
left hand corner of Figure 2) for incorporation into an
existing catalog record.
To sum, in this paper we will present:
• The problem of subject term access in image retrieval
• The CLiMB system, which utilizes computational
linguistics and machine learning to improve basic
keyword search through:
• Semantic categorization of text segments
• Disambiguation User evaluation studies and findings
Selected References
Chen, H. (2001) An Analysis of Image Retrieval Tasks in
the Field of Art History. Information Processing & Management,
Vol. 37: 701-720.
Choi, Y. and E. Rasmussen (2003) Searching for Images:
The Analysis of Users’ Queries for Image Retrieval in
American History. Journal of the American Society for
Information Science and Technology, Vol. 54: 498-511.
Collins, K. (1998) Providing Subject Access to Images:
A Study of User Queries. The American Archivist, Vol.
61: 36-55.
Keister, L.H. (1994) User Types and Queries: Impact
on Image Access Systems. In: Fidel, R., T.B. Hahn, E.
Rasmussen, P. J. Smith (eds.): Challenges in Indexing
Electronic Text and Images. Learned Information for
the American Society of Information Science, Medford:
Klavans, Judith L, Carolyn Sheffield, Eileen Abels,
Jimmy Lin, Rebecca Passonneau, Tandeep Sidhu, and
Dagobert Soergel (2009) Computational Linguistics for
Metadata Building (CLiMB): Using Text Mining for the
Automatic Identification, Categorization, and Disambiguation
of Subject Terms for Image Metadata. Journal
of Multimedia Tools and Applications, Special issue on
Metadata Mining for Image Understanding (MMIU)
42(1):115-138. Elsevier: Paris.
Passonneau, R., T. Yano, T. Lippincott, J. Klavans (2008)
Functional Semantic Categories for Art History Text:
Human Labeling and Preliminary Machine Learning. 3rd
International Conference on Computer Vision Theory
and Applications, Workshop on Metadata Mining for
Image Understanding: 13-22.
Pastra, K., H. Saggion, Y. Wilks, (2003) Intelligent Indexing
of Crime-Scene Photographs. In: IEEE Intelligent
Systems: Special Issue on Advances in Natural Language
and Processing, Vol. 18, Iss. 1: 55-61.
Notes 1

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None