Making George Washington's Financial Documents Accessible: Transcription, Data, And The Drupal Solution

poster / demo / art installation
  1. 1. Elisabeth Jennifer Stertzer

    University of Virginia

  2. 2. Erica Fallon Cavanaugh

    University of Virginia

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The George Washington Financial Papers Project exists at the intersection of two challenges editors currently face: managing complicated editorial work and navigating the world of digital publication. By focusing on a particularly difficult and dynamic dataset—financial documents—work has advanced on three interconnected fronts: 1) developing document templates for both traditional financial documents, such as account books and ledgers, as well as receipts, journals, and memoranda; 2) developing taxonomies and data visualizations; and 3) constructing an open-source content management/editorial/publication platform. The work has resulted in the development of both an open-access digital edition of Washington’s financial documents as well as the groundwork of Drupal for Editors prototype—a Drupal-based, open-source, editorial/publication platform—providing editors with a stable, flexible, and powerful platform to build engaging digital editions of financial documents.
In 2013, The Papers of George Washington received a grant from the National Historical Publications and Records Commission (NHPRC) for the perfection and population of the content management database (DocTracker) with Washington’s three major ledger books; preparation of Gouverneur Morris’s 1811-1816 account book and its entry into the content management database in partnership with the Gouverneur Morris Papers at the New-York Historical Society; and the completion of a primary version of a web interface that will provide users with free access to the edition’s entire content and permit downloading and data manipulation.
During our partnership with DocTracker we helped design a viable content management and customized editorial workflow solution built on the proprietary, commercial database software FileMaker Pro. DocTracker allowed us to manage both document records and content identifications, and associate both with transcriptions. But as a publication platform it was limited because of its use of XML. We investigated alternative publication options and decided on Drupal, a highly-configurable open-source content management system. We determined Drupal to be the best publication solution for several reasons: 1) at its core, Drupal is a database in which imported content can be mapped to fields, allowing for robust displays and searching, querying, and browsing; 2) Drupal is accessible, both in terms of cost and usability and has a large user community; 3) both the backend (content/data) and frontend (website interface) are managed in the system; and 4) Drupal is open-source and its core and add-on (module) code are developed and actively maintained by a large international developer community.
Drupal has allowed the project to confront the numerous challenges inherent in these documents: (1) different types of financial documents are formatted in distinct, though standardized, ways, and the formatting of financial documents carries implied meanings; (2) transactions are full of dittos, abbreviations, and short hand, that raise a question of what kind of fields should be created to capture the transcription and clear text, thereby making both the text and content searchable; (3) the documents present issues of currency, valuation, and barter; and (4) a hierarchy of documents exist, and therefore the same transaction may be recorded in a day book, account, and ledger, etc., generating multiple instances of the same transaction.
Indeed, one of the primary goals of the Project is to make accurate transcriptions of the documents available, in keeping with the long tradition of the Papers of George Washington documentary editing project. However, the types of information, or the “data,” contained in these documents are not easily accessible using common search and query techniques. The challenges, as described above, make it impossible to simply transcribe and put online, ready to be searched and understood document transcriptions. The solution involves a combination of transcription and corresponding data fields (where dittos, abbreviations, and short hand have been expanded), node references associating various content types, and term references connecting taxonomies. Additionally, Drupal provides a place to develop and manage taxonomy lists for specific content types, such as financial documents, to enhance the grouping and sorting of content and be used to identify relationships between different types of content.
Developing this system has challenged us to think creatively about all aspects of the editorial and publication process, resulting in innovative ways for users to explore, analyze, and interact with content. This poster and hands-on demonstration will explore these issues and the technological solutions to make these documents available, as a free online resource as well as highlight strategies for content searchability, including annotation, glossaries, indexes, and linking.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website:

Series: ADHO (11)

Organizers: ADHO