Textbases and Databases: Integrating Library Catalogs with Digital Libraries

Perry Willett

Authorship

1. Perry Willett

Indiana University, Bloomington

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Online Public Access Catalogs (OPACs) in libraries now include bibliographic records for WWW sites. In most or all cases, these records contain direct links to the web resources themselves, so that someone using a WWW-based catalog could go directly from a bibliographic description of a website to the website itself, just by clicking on the URL in the catalog record. OPACs also have records for individual items within digital libraries, so that readers will be aware of the existence of a particular text or digital object within larger collections.

However, once someone leaves the OPAC and enters a digital library on the WWW, the advantages of careful bibliographic control may be lost. Most digital libraries are based on SGML-encoded files, whether full-text TEI-encoded files, or Encoded Archival Description (EAD) formatted finding aids (or some other markup language) while library catalogs use records in the MAchine Readable Cataloging (MARC) format. It will become increasingly important to find ways to get these SGML-based digital libraries to interact with MARC-based library catalogs.

In cataloging an item, catalogers spend time determining bibliographic information and forms of names and titles as a way of making sure the item is exactly described. Also, subject headings for each item are determined. By not providing links to online catalog records or including full information within the digital library searching and browsing mechanisms, readers may be misdirected or misinformed. The issue is more complex than reproducing or reformatting a MARC record within the TEI or EAD Header. There are at least three reasons why digital libraries should be linked dynamically to online library catalogs:

The difficulties presented by names
Accepted forms of names and subject headings change
Digital libraries may combine both cataloged and uncataloged materials
Catalogers take great care in creating and maintaining Name Authority Files in an attempt to keep straight all of the various authors that might share the same or similar names, in addition to pseudonyms, variant spellings, and married or maiden names. Names are a source of contention among scholars, almost as much as texts themselves - the name by which any given author is known may change, and consistent rules for establishing standard names are difficult to establish. Just to give a few well-known examples, Charlotte Bronte published under the pseudonym "Currer Bell," yet is known by her real name; Marian Evans published under the pseudonym "George Eliot," and is known by her pseudonym. Following U.S. Library of Congress rules, works by Mark Twain were formerly filed under "Samuel Clemens." This changed a few years ago, and now are under "Mark Twain." There are countless such cases, much more vexed and complicated than these examples. The intent of a Name Authority File is to group together an author's works, no matter under which name it was published, so that they can be found by searching under any variant name or pseudonym. Of course, Name Authority Files are neither comprehensive nor perfect, but in developing digital libraries, it seems counterproductive to try to duplicate the effort already expended in creating a Name Authority File.
Catalogers also expend a great deal of effort in maintaining the information contained within library catalogs, and online catalogs are dynamic databases under constant revision. Accepted forms of subject headings and authors' names change over time, and libraries routinely perform global changes within library catalogs. If subject headings are included within the header of an electronic text, it is doubtful that the header will be updated should the accepted form be changed in the OPAC. Over time, the information in digital libraries will grow out of synch with the OPAC.

Not everything in a library collection is cataloged, and this is especially true for manuscript and archival collections. Catalogers and archivists have rules for what gets cataloged and what does not. Letters, photographs, and other items within manuscript collections generally are not cataloged separately with MARC records. Instead, catalogers create collection-level MARC records for the online catalog, and archivists then create finding aids that describe the contents of manuscript collections in more detail. Some archival collections combine cataloged materials, such as books, recordings, or films, with uncataloged items, such as photographs, sheet music, or letters. Digital libraries created from such collections will need to consider the various sources of bibliographic description available, which may include both MARC records and finding aids. As digital libraries become larger and more complex, it will become essential that they draw from and interact with online library catalogs. Digital libraries will not want to duplicate the bibliographic descriptions and subject headings available from online catalogs.

I will look at how two projects at Indiana University have begun to address this problem by integrating information from the online library catalog with digital library collections, and some of the problems and pitfalls encountered. The Hoagy Carmichael Collection <http://www.dlib.indiana.edu/collections/hoagy>, which has digitized most of the Carmichael collection available at Indiana University, combines three sources of information: an EAD finding aid for the music, lyrics, photographs, correspondence and other materials; MARC records for the sound recordings, extracted from the library catalog and converted to a MARC SGML format developed by the U.S. Library of Congress; and finally, the TEI-encoded full-text correspondence. At present, we are extracting MARC records and converting them to SGML using batch processes, but are working on ways that this interaction can occur in real time. I will focus here on the use of the MARC records as part of the overall metadata for the project, and the process of conversion to SGML.

Second, the Victorian Women Writers Project (VWWP) <http://www.indiana.edu/~letrs/vwwp> has begun a project to use the Name Authority File (NAF) records from the online catalog to keep track of authors and their variant names. The VWWP currently has works by only 42 authors, but even this small sample presents some complex issues surrounding authors' names. I will demonstrate the process by which Name Authority File records are integrated with the VWWP collection, allowing for more complete information on authors' names.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2000

Hosted at University of Glasgow

Glasgow, Scotland, United Kingdom

July 21, 2000 - July 25, 2000

104 works by 187 authors indexed

Affiliations need to be double-checked.

Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/

Series: ALLC/EADH (27), ACH/ICCH (20), ACH/ALLC (12)

Organizers: ACH, ALLC

Textbases and Databases: Integrating Library Catalogs with Digital Libraries

1. Perry Willett

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2000