Visualizing Archival Collections with ArchivesZ

poster / demo / art installation
  1. 1. Jeanne Kramer-Smyth

    University of Maryland, College Park

  2. 2. Jennifer Golbeck

    University of Maryland, College Park

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Archival records and manuscripts are usually arranged
to retain their original order when transferred
into the care of archivists or manuscript curators.
A side effect of grouping records by record creator and
retaining the creator’s original organization is that materials
are described at the group level—not at the item
level. The ramifications for archive searchability are dramatic:
Imagine a library where, instead of being grouped
together by subject, books were shelved alphabetically
by author—and interspersed with each author’s notes,
drafts, expense records, and personal memorabilia. This
basic difference between libraries and archives is key to
understanding why subject-based access to archival resources
is both challenging to achieve and very useful
when available.
With this in mind, we have developed ArchivesZ, an information
visualization tool for archival collections. It
enables users to visualize and explore aggregated information
relating to the total linear feet, inclusive years
and subject terms for archival collections extracted from
EAD encoded finding aids.
Chris Anderson of Wired described the power of the
“long tail” in his Wired article of the same name. He discussed
that the future belonged not to the bestsellers, but
rather to “the millions of niche markets at the shallow
end of the bit-stream.”1 There has been much discussion
of the long tail with regard to library resources2, but it is
interesting note that archival materials are virtually all
long tail. The nature of archival collections is such that
many of those with the greatest desire to access the materials
have very narrow and specific interests. It is quite
rare that the documents in a single archival collection
will be popular, in the sense of a bestselling book. Frequently
it is a challenge for humanities scholars wishing
to use archival materials to figure out how to approach
the search process. Use of a visualization tool designed
to support the examination of aggregated information
about archival collections could support a more serendipitous
process of exploration of materials and the discovery
of new avenues of research.
Even the most experienced historian or humanities scholar
has struggled with the challenge of locating relevant
primary sources. Archival record groups and manuscript
collections present unique challenges to researchers. For
example, a standard search result list shows only the title
and short description for each record group or collection.
This list fails to convey the quantity of materials or
diversity of subjects covered by the combination of collections
returned by the search. A visualization tool that
supports examination of cross-collection and cross-institution
aggregated data about archival collections could:
• Encourage the browsing and exploration of locally
available cultural heritage resources,
• Improve understanding of existing collections,
• Permit easy identification of locations with a rich
combination of collections applicable to a particular
research project, and
• Increase interest in both the humanities and primary
Encoded Archival Description (EAD) is the international
de facto standard for encoding archival finding aids in
an XML format. Finding aids include information about
who created the records, when they were created, why
they were created, what topics the records relate to, and
the size of the collection. The archival community has
spent much of the past decade encoding existing finding
aids using the EAD standard. Up to this point the major
selling point of EAD has been as a tool for simplifying
the process of publishing finding aids online. While work
has been done to create tools to facilitate the encoding
of finding aids, the next step is to take advantage of the
structured data now available in EAD encoded finding
aids. This machine readable data can support the creation
of innovative software programs intended to extract, organize,
facilitate discovery of and aggregate information
about archival resources.
Tools for visualizing archival collections support the
needs of three distinct user groups.
• Archivists and manuscript curators can use such a
tool to improve their understanding and validate the
metadata of the collections at various institutions including
their own.
• Literary researchers, historians and humanities
scholars can use this type of tool to permit easy identification of institutions with archival collections
fitting the criteria of their research.
• Finally, this type of tool can enable exploration of
locally held cultural heritage materials by students
and promote use of primary sources. In contrast to
researchers who frequently have very specific interests
before they examine the collections held by an
institution, students in the university setting who are
interested in humanities topics are likely not aware
of the primary sources available. A tool of this type
might encourage the browsing and open ended exploration
of locally available cultural heritage resources,
and increase interest in both the humanities
and primary materials.
Built in spring of 2007, the first version of ArchivesZ
is a prototype for just such a tool. Designed to support
search, exploration and visualization of archival record
groups and manuscript collections, ArchivesZ addresses
a major challenge facing humanities scholars - the need
to understand the scope and quantity of available archival
records and manuscripts.
Fig. 1 Screenshot of ArchivesZ Prototype (video
demonstration online at
To support organic exploration of subject terms associated
with collections, ArchivesZ leverages a unique
dual sided histogram (see right half of Figure 1). The
ArchivesZ prototype combines this dual sided histogram
with a more traditional histogram displaying year data to
permit tightly coupled, multi-dimensional browsing of
subject and time period metadata. By representing the
distribution of subjects and time periods using the metric
of total aggregate linear feet, ArchivesZ permits users
to get a better sense of total available research materials
than they would by viewing a standard search result list.
The subject term visualization interface may also support
a deeper understanding of the relationships among
subject terms through the lens of the currently selected
set of collections.
Further development of ArchivesZ has been supported
by a National Endowment for the Humanities Digital
Humanities Startup Grant. In this Poster / Demo, we will
present the newest version of ArchivesZ in use over a
large set of finding aids provided by a wide range of partner
archives. Our demo will show the newest version of
he tool and we will discuss how this lays the foundation
for the future creation of a public tool for visualizing archival
1C. Anderson. The long tail. Wired, 12(10), 2004.
2 L. Dempsey. Libraries and the long tail: Some thoughts
about libraries in a network age. D-Lib Magazine, 12(4),
April 2006.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None