Using Computer Vision to Improve Image Metadata

paper, specified "long paper"
Authorship
  1. 1. Doug Reside

    New York Public Library

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction

We are approaching a golden age in the study of visual art and photographs. Many museums, libraries, and universities have digitized large portions of their collections and have made the images, and associated metadata, available for study. This process has been a major boon to art historians, collectors, and other researchers. Instead of calling individual items one at a time in library reading rooms or digging through old, expensive, incomplete, and often out-of-print art catalogs, a researcher can simply type a query into a library or museum database and retrieve large sets of images. However, while the rate of digitization has lately increased, this has, at times, come at the expense of detailed cataloging. Even before the era of mass digitization, catalogers struggled to identify certain works of art (sometimes due to a lack of collaboration between institutions). Today, simple digitization is often faster and cheaper than expert cataloging, and so many works of art and photos appear in repositories with limited metadata using inconsistent schema or vocabularies1. The result is that while there are more works of art online than ever, it is still difficult for researchers to find the images they seek. However, advances in automated fuzzy image recognition may allow researchers to discover relevant content even when available metadata is limited. In this paper, we will examine two test cases--photographs of theatrical performance with unidentified actors, and mis-attributed Japanese woodblock prints--to demonstrate how image search algorithms can be used both to locate content and help researchers understand it better.
Overview of the test cases

Woodblock prints

In December of 2012, John Resig (Visiting Researcher at Ritsumeikan University and creator of the jQuery JavaScript library) released Ukiyo-e.org: a database of Japanese woodblock print images with metadata harvested by traversing the publicly-accessible digitized collections of prints at the targeted institutions. The images were copied and saved to a separate server for faster access (a technique which avoids overburdening the institutions by loading the images directly from their websites). The information on the website is organized broadly by artist and time period on the homepage, but is primarily designed to be used as a search engine allowing users to search both by text and by images. The database currently contains over 213,000 prints from 24 institutions collected from from late 2011 to late 2012.
One of the most important features of the Ukiyo-e.org2 website is its ability to do real-time analysis on the images it holds for comparison and searching. There is frequently disagreement among major institutions regarding the attribution, dating, titles, and other information associated with a print. Because of this incongruous metadata, it becomes virtually impossible to find similar prints among multiple institutions. The one piece of information that is never under contention, however, is the image of the print itself. The image that is presented by most institutions usually includes a full, straight-on photograph of the print (or prints, if it’s a diptych, triptych, or similar). By ignoring the metadata provided by the institutions and comparing only the actual contents of the images, it is possible to find similar-looking prints at different institutions.
Theater Photographs

Along with works of art, many libraries and archives have recently begun to publish large sets of photographs, often depicting unidentified people. In some cases, the lack of any additional information makes identification of the faces in the photographs all but impossible. However, images from performing arts collections often feature well-documented events with widely recognizable people appearing alongside lesser known figures. A rehearsal shot of a musical comedy, for instance, might feature a star in front of a chorus of anonymous extras. Metadata for such photographs may identify the star, and the title of the piece in which he or she is performing if it is known, but the supporting cast is generally left unidentified.
In early 2012, Doug Reside, Digital Curator for the Performing Arts at New York Public Library, began a series of experiments to attempt to identify these performers. Over 90,000 theater photographs are now available on the Library’s website with varying degrees of metadata3. In most cases, the work being presented is identified. Information about the cast and crew of these productions often can be harvested from other online databases such as Playbill Vault4, the Internet Broadway Database5, and DBPedia6. Given infinite time, a human investigator could theoretically identify many of the anonymous people in the Library’s photographs by finding all instances of the face in any online photograph, and then using additional datasets to determine the most likely name associated with it. An otherwise anonymous person, might, for instance, be identified in a newspaper photograph or in a headshot in theater program. Similarly, it might be possible to identify an otherwise anonymous actress if the shows in which her face appears uniquely match her resume as constructed from published cast lists.
Methodology

Both test cases would benefit from computer vision algorithms capable of searching a corpus of images for a set of very similar (but not necessarily identical) images. Although research in this area began decades ago, implementations capable of comparing thousands to millions of images from various sources simultaneously have only emerged very recently7. The general availability of this technology, however, has been mixed. Tools such as imgSeek8 have made rudimentary image comparison technologies available for use in Open Source projects, but at present commercially-available tools with public APIs (such as TinEye’s MatchEngine9) provide faster image analysis with a greater level of clarity. Neither tool, however, is exactly suited for facial recognition, which requires the ability to identify a face pictured at different angles, under different lighting, in front of varying backgrounds, and at varying sizes (depending on the distance of the subjects for the camera).
The MatchEngine tool, while a commercial service, is well suited for finding images that are close matches of one another, or even partial matches embedded inside a larger image (as in the case of triptychs). Like imgSeek, MatchEngine was able to find images by upload and quickly process newly-added images. Resig tested both MatchEngine and imgSeek during the development of Ukiyo-e.org and found that MatchEngine was much better at finding exact matches, ignoring differences in color, and finding prints (or portions of prints) inside other print images.
With an effective image similarity engine it became possible to develop many new tools to aid woodblock print researchers. Using the tools available on Ukiyo-e.org, researchers can now look for a print not just by a title, description, or artist name (there is generally little agreement on the metadata between institutions) and instead find a print by providing just a photo. Additionally, scholars who are researching the manipulation and reuse of the physical woodblocks over time can now more easily locate prints that are derived from the same block but have different imagery. Finally, a tool has been constructed to automatically provide institutions with corrections for their metadata, made possible by finding similar prints and then automatically comparing their associated metadata, looking for differences. All of these tools are able to provide unprecedented improvements to researchers, scholars, and institutions.
The theater photographs work is at a somewhat earlier stage in its development. The project’s early experiments with face recognition in the computer vision library OpenCV 10 identified the location of faces within a photograph reliably, but could not be used to suggest whether a face in one photograph belonged to the same person as a face in another. More promising has been the OpenBR11 library from MITRE Corporation which, after “registering” a library of photographs, can quickly return a set of photographs from the library containing faces that most closely match one depicted in a new photograph. For faces displayed at similar angles and under similar lighting, it performs relatively well, but when the angle changes and additional faces appear in the picture, mistaken identification is more common than success.
Nonetheless, the mistaken identifications are sometimes usefully provocative. False positives reveal similarities among physical characteristics, costumes, and makeup that may not be obvious. For instance, a search using a photograph of young Roddy McDowall as Mordred in the original 1960 Broadway production of Camelot returned (with 100% certainty): Steve Lawrence in a 1967 production of Golden Rainbow, a headshot of actress Sybil White, and a photograph of George C. Scott in Plaza Suite. To the most observers, these faces bear relatively little resemblance, however, to the face recognition algorithm, which looks mostly at the shape and position of the eyes12, the faces appeared identical. As earlier investigations by Jerome McGann have revealed13, this “deformance” of the image by the algorithm may reveal new ways of interpreting the objects. Are there any other similarities (not just visual) among the performers (or the characters they are portraying), that may not have been noticed without the provocation of the algorithm. What do these “fail cases” suggest about the casting practices or makeup design on the mid-20th century Broadway stage?
This paper explores both the promising successes and provocative failures of image analysis tools for humanities research, and suggests future avenues of research the technology makes available to scholars.
References

1. Park, Jung-Ran.Metadata Quality in Digital Repositories: A Survey of the Current State of the Art. Cataloging & Classification Quarterly 47, no. 3–4 (2009): 213–228. doi:10.1080/01639370902737240.
2. ukiyo-e.org
3. digitalcollections.nypl.org
4. www.playbillvault.com
5. www.ibdb.com
6. dbpedia.org/About
7. Google, Yahoo, and Microsoft all provide image search engines that are capable of searching millions of images.
8. www.imgseek.net
9. services.tineye.com/MatchEngine
10. opencv.org
11. openbiometrics.org
12. J. Klontz, B. Klare, S. Klum, A. Jain, M. Burge.Open Source Biometric Recognition, Proceedings of the IEEE Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013.
13. Jerome McGann and Lisa SamuelsDeformance and Interpretationwww2.iath.virginia.edu/jjm2f/old/deform.html (augmented version also available in New Literary History 30 (winter, 1999), 25-56

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO