DESCRIBING COLLECTIONS DIGITALLY: METADATA FRAMEWORKS FOR RESEARCH LIBRARY COLLECTIONS

Jared Campbell

Authorship

1. Jared Campbell

University of California, Davis

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Researchers and students have come to expect an increasing number of services via the World Wide Web from libraries and archives. Over the last two decades libraries have vast staff and financial resources to developing online catalogs and digitizing materials for remote use. The result has been the development of increasingly more powerful research tools for students and scholars that can be accessed from a simple networked computer twenty-four hours a day and from any geographic location. In addition to this, researchers are also expecting greater unmediated access to these collections. Despite this improved access, librarians and archivists continue to be faced with the challenge of how they help potential users find these collections and how they can describe them in a way that integrates access to digital and traditional physical collections. This paper surveys current work being done in collection level descriptions for traditional and digital information resources. Focusing on the library and archival communities, we will look at several descriptive schemas that have been developed to provide access to collections of online resources, including the MARC bibliographic record format, the RSLP Collection Description Schema, the OAIS Harvesting protocol, and Encoded Archival Description (EAD). Focusing on resource discovery and analytic content description, the paper will compare the types of access each of these schema allows in terms of the detail of description, how easily data can be mapped or compared to other record schemas, and the ability for each to be integrated with one another. The metadata formats examined are those concerned primarily with resource discovery: records that provide a summary or detailed description of the collection and its contents. The oldest and most widely used
100
of these is the Machine-Readable Cataloging (MARC) record. Created as a means of storing and transferring bibliographic data, it has been the standard data file format for library and many archival online catalogues. MARC, along with its content standard the Anglo-American Cataloging Rules 2nd ed. (AACR2), has been used to describe everything from books to multimedia web pages. The combination of MARC and AACR2 is useful because of their emphases on strong content and formatting standards. For instance, in a MARC record, the main title of a work will always be transcribed in a 245 field using AACR2 to formulate the entry. Though a patron may never see a record in its raw MARC form, they will recognize just about any web output that is the product of that record. Additionally, MARC and AACR2 place a strong emphasis on the need for name and subject control. Name headings and subject terms must come from an established thesaurus (such as the Library of Congress Headings, Library of Congress Name Authority File, or the Arts and Architecture Thesaurus) that has been identified in the record. Though useful from an end user perspective, this rigidity causes problems when describing library collections rather then individual bibliographic works. This problem of scalability stems from the fact that MARC and AACR2 were developed to handle description of individual items. The collection level MARC record tends to consist of large unstructured textual note fields that make it difficult to systematically compare it to other collection level records beyond controlled subject vocabularies and names. To remedy these problems, the archival community developed the Encoded Archival Description (EAD) as a means providing more detailed access to archives and manuscript collections. EAD is a SGML/XML DTD designed to capture both content and structure of archival finding aids. The use of SGML/XML allows archivists to encode the hierarchical document structure at a very granular level, facilitating cross collection searching. Another strength of the EAD is the ease with which it can be converted into a web friendly html document. The development of Extensible Style Language (XSL) allows institutions and individuals to present their collection data on the web in a variety of different ways. The major problem in implementing EAD is in the time and resources required to encode finding aids. Integrating EAD into an archives’ or special collection department’s descriptive program requires a tremendous amount of training and planning to fully utilize the DTD’s potential. Moreover, the EAD DTD was designed specifically to be used with archival and manuscript collections. As a result, the structure and semantics used in these records come from that tradition and are not all that easy to adapt for other types of collections. Another current means of gathering and disseminating collection level descriptions has been through the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH). This protocol is designed to automate the process of searching out scholarly information that may be hidden from traditional web browsers (Yahoo, Google, etc.). Common targets for the metadata harvesting include structured XML documents, finding aids, and resources that may be living inside of a database (photo images). Once harvested, records are then created and maintained in a central database. Depending on the metadata that is being harvested the OAI-PMH can create both collection level records and analytic records for the individual items within the collection. A main goal for OAI-PMH projects (University of Illinois, University of Michigan, and Virginia Tech) is to provide a central repository of structured data that allows for item level searching across institutional collections. Theoretically, a user could find records for two different transcriptions of the same text that may have been digitized and stored in two separate project text-bases. The United Kingdom’s Research Support Library Programme (RSLP) has produced a more recent development in collection level description. In an attempt to provide a more standardized cross-institutional collection level description, RSLP developed the RSLP Collection Description Schema. Developed as both a collections management tool and a resource discovery mechanism for scholars and students, the schema provides much more detailed description of collections then currently available in MARC. The result is a schema that brings together, in a single record, descriptive attributes about the collection itself, the location or locations of the resource, resource creators, agents (i.e. collector, owner, and administrator), and external relationships with other collections. Unlike MARC the RSLP schema is scalable in that it can be used to describe anything from a small collection of digital images to an entire library collection. Based on these comparisons, the paper concludes with a discussion of the importance of early and continued planning of descriptive practices as the most important means of building and implementing flexible and robust collection level metadata. The paper suggests that as early as possible in this planning phase, project planners consider their text or image-base’s primary and secondary audiences, the level of detail required to make sense of the collection, the scope of the materials. Creators of scholarly projects and collections should consult early with library metadata specialists early in the project planning process to discuss issues relating to how the existence of collection is made known, how collections can be linked (thematically, geographically, etc.) with others, and how potential researchers will be able to successfully navigate through various query interfaces. Finally, potential collection users need to be included during the planning stages to take a more active roll in the development of useful metadata. This is especially true of descriptions of digital collections in which the goal is to provide unmediated access.
101
In providing an overview of current thinking about collection description in the library and archival world, this paper aims to provide a context for students, researchers, and other users of library and archival materials think think about their information needs as they undertake humanities research. These needs should determine the extent to which current and new schemas are appropriate for their own research projects or whether new frameworks need to be created.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2003

"Web X: A Decade of the World Wide Web"

Hosted at University of Georgia

Athens, Georgia, United States

May 29, 2003 - June 2, 2003

83 works by 132 authors indexed

Affiliations need to be double-checked.

Conference website: http://web.archive.org/web/20071113184133/http://www.english.uga.edu/webx/

Series: ACH/ICCH (23), ALLC/EADH (30), ACH/ALLC (15)

Organizers: ACH, ALLC

DESCRIBING COLLECTIONS DIGITALLY: METADATA FRAMEWORKS FOR RESEARCH LIBRARY COLLECTIONS

1. Jared Campbell

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2003

"Web X: A Decade of the World Wide Web"