Crowdsourcing Training Data: Efficacy and Ethics

paper, specified "short paper"
  1. 1. Alison Hedley

    McGill University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Paid crowdsourcing presents a useful tool for building large-scale datasets of many kinds, including for humanities and cultural heritage work, but use of crowd labour is not without logistical and ethical challenges. This paper summarizes how the Visibility of Knowledge Project is using Mechanical Turk to develop training data and relates our best practices to broader challenges in paid crowdsourcing’s ethics and efficacy. The experience of the VOK team suggests that devising itinerant communication tactics is necessary for any digital research projects that wish to use paid training data crowdsourcing in a manner that is both effective and ethical. Unfortunately, the nature of crowdsourcing work and paid platform design are such that the ethics of using crowd-labeled training data will almost certainly remain fraught, even as the need for large training datasets increases in many knowledge fields.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO