Photogrammar: Organizing Visual Culture through Geography, Text Mining, and Statistical Analysis

paper, specified "long paper"
Authorship
  1. 1. Lauren Tilton

    Yale University

  2. 2. Peter Leonard

    Yale University

  3. 3. Taylor Arnold

    Independent Scholar

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Farm Security Administration – Office of War Information photographic dataset is a collection of over 170,000 monochrome and colour photographs, commissioned between 1935 and 1945 by the government of the United States of America. Offering a unique snapshot of the nation during the period, it serves as an important visual record for scholars and the public­at­large. The FSA­OWI photographic archive has been digitized by United States Library of Congress, and because the photographs were taken on behalf of the United States Government, access to and use of the collection is essentially free and open.

Under the direction of Professor Laura Wexler, (American Studies and Women, Gender & Sexuality Studies, Yale University), the Photogrammar project takes the conditions of this archive (pre­digitized and publicly accessible) as the starting point for a digital, scholarly and open source platform that builds upon, and significantly extends, the Library of Congress’ online collection. The subject of a successful Digital Humanities Start­Up Grant from the United States National Endowment for the Humanities, the project seeks to answer research questions that emerged from scholars at Yale University. Our paper will focus on three Digital Humanities techniques that we have used to analyze the corpus:

Geospatial: Computational derivation of latitude and longitude;

Text Mining: Vector­space analysis to expose thematic similarity;

Statistical Analysis: Contextual inference to “re­discover” missing metadata.

Each of these techniques is associated with significant information gain, relative to the previous state of scholarship on the corpus:

Geographic: The collection is often characterized as being about the dust bowl and rural poverty in the American south during the Great Depression. In fact, by mapping the photographs and analyzing photograph density at the county level by year, the popular characterization of the FSA­OWI does not hold. Rather, the scope of sight was much broader including a large focus on the United States Northeast and Midwest, as well as photographs beyond the continental United States such as the Virgin Islands and Europe.

Text Mining: The 1940s ontology of the collection only allowed a photograph to be classified in one category at once. By looking at latent patterns in the free­form textual descriptions, we are able to surface photographs that participate in multiple and overlapping clusters. In this way, we can discover thematic similarity between the work of several different photographers, active at different times and in different places in the country (and around the world). This approach reflects a more general turn towards ‘latent’ patterns in unstructured data within the archive.

Statistical Analysis: The relatively large scale of the collection (available as both digitized negatives and physical prints) as well as constantly changing organizational systems through the years has unfortunately left a majority of the negatives with minimal documentation. Utilizing latent metadata attached to the photographs, we are able to take individual photographs and to put them back into strips of four and five. In turn, this allowed for us to insert new metadata into the photographs. For example, if a frame with an unknown photographer and location is between photos by John Vachon in Chicago, then we know the unknown frame is by this photographer in this location. We will discuss the statistical methods applied using R.

These three techniques open up new questions about this collection and historic period, and challenge previous scholarship. We believe the Photogrammar project can serve as one example of the general question of how to engage with large­scale digital archives of visual culture. This question is of particular importance for scholars who seek to bring Digital Humanities techniques to “Big Data” collections, whether those curated by libraries, museums, or scholars themselves. We anticipate both similarities — and important differences — with European archives of the same period, including the UK Mass Observation Archive (1937­1960s), and forthcoming collections hosted by Europeana Online.

In addition, we will discuss how this project offers a new, user­friendly way to access a visual archive of this size by sitting at the intersection of public and digital humanities. We will discuss the ways in which we intend to open up the collection to contributions from the public at large, with lessons learned from previous attempts to crowdsource metadata for this collection (Flickr Commons / New York Public Library 20084, Flickr Commons Library of Congress 20095). We will show a prototype of a publically­accessible Geographic Referencer, to allow end users to more accurately and appropriately locate photographs on both current and historic maps. And we will discuss some of the challenges in incorporating crowd­sourced metadata corrections into a historic archive, while preserving the integrity and historic character of a large visual collection.

References
http://www.loc.gov/pictures/collection/fsa/

http://www.loc.gov/rr/print/res/071_fsab.html

http://americanstudies.yale.edu/faculty/laura­wexler

http://www.flickr.com/photos/nypl/sets/72157610969038056/

http://www.flickr.com/photos/library_of_congress/sets/72157618541455384/

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO