The Electronic Archive of Early American Fiction (1775-1850)

David Seaman

Authorship

1. David Seaman

University of Virginia

Original URL

https://web.archive.org/web/20020713214958/http://www.cs.queensu.ca/achallc97/papers/p039.html

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Electronic Archive of Early American Fiction (1775-1850)
David Seaman
University of Virginia
etext@virginia.edu
Keywords: SGML, archival imaging, early American Fiction

Introduction
This 125,000-page project takes the University of Virginia Library into a level of archival-quality text and image production rarely seen in rare books archives. In preparing for this project we have tackled issues of funding, production-level digital equipment and practices, partnerships with commercial publishers to disseminate the results, and large-scale storage issues. This paper will outline the project, explain the workflow, equipment, and text and image standards that we think appropriate for creating data of long-term viability, and explore the lessons we are learning (and expect to learn) regarding the economics of undertaking a cost-recovery process.
Scope
The Early American fiction project will create electronic texts for the 425 titles (582 volumes) which are in the Barrett and Taylor collections at the University of Virginia Special Collections Department. The list includes major works of Edgar Allan Poe, James Fenimore Cooper, Nathaniel Hawthorne, and Washington Irving but also includes many lesser known authors such as Anne Newport Royall, Samuel Benjamin Judah, and Charles Frederick Briggs. By including the lesser known works and authors we hope to represent the fabric and context of early American literature, making available to teachers and researchers what Americans were reading during the first 75 years of the history of our nation.

Digital Formats
The project will combine high-quality color page images of all 125,000 pages (including covers and spines) with TEI-encoded text versions, allowing scholars all over the world a rare sense of the physical reality of the volumes being studied as well as providing a fully-searchable SGML database. All images will be scanned with a digital camera at approximately 400 dpi, 24-bit color, and archived as TIFF files. The paper will cover the challenges of managing this vast amount of data, and the necessity for such large page-image files. JPEG derivatives will be generated for on-line use.

All the text will be encoded in TEI. The conversion to tagged ASCII text will be done under contract with a keyboarding company, who will also add some of the markup. The texts will be completed and parsed at UVa., and mounted on the web. The paper will report on this workflow, and outline the lessons we learn in handling large quantities of TEI text and color TIFF images.

Economics
A key part of this project will be a structured measurement of usage of the e-texts created in the project, and a comparison of that usage with the usage of original rare books. In addition to the economics of use, there will be a report on our cost-recovery assumptions, which include a partnership with a commercial publisher to market a CD version of the database.

Conclusion
The Electronic Archive of Early American Fiction project presents the opportunity to study scholarly use of original rare books and of their computer simulacra, and to determine the extent to which electronic texts of rare books can serve scholars and teachers, and to compare the usage and costs of electronic texts and of original paper texts of rare books. This paper will outline the scope of the project and report on what we have learned to endorse or challenge our initial assumptions about workflow, cost, level of tagging, commercial interest, and image quality.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997

Hosted at Queen's University

Kingston, Ontario, Canada

June 3, 1997 - June 7, 1997

76 works by 119 authors indexed

Conference website: https://web.archive.org/web/20010105065100/http://www.cs.queensu.ca/achallc97/

Series: ACH/ALLC (9), ACH/ICCH (17), ALLC/EADH (24)

Organizers: ACH, ALLC

The Electronic Archive of Early American Fiction (1775-1850)

1. David Seaman

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1997