Paid crowdsourcing presents a useful tool for building large-scale datasets of many kinds, including for humanities and cultural heritage work, but use of crowd labour is not without logistical and ethical challenges. This paper summarizes how the Visibility of Knowledge Project is using Mechanical Turk to develop training data and relates our best practices to broader challenges in paid crowdsourcing’s ethics and efficacy. The experience of the VOK team suggests that devising itinerant communication tactics is necessary for any digital research projects that wish to use paid training data crowdsourcing in a manner that is both effective and ethical. Unfortunately, the nature of crowdsourcing work and paid platform design are such that the ethics of using crowd-labeled training data will almost certainly remain fraught, even as the need for large training datasets increases in many knowledge fields.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Carleton University, Université d'Ottawa (University of Ottawa)
Ottawa, Ontario, Canada
July 20, 2020 - July 25, 2020
475 works by 1078 authors indexed
Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.
Conference website: https://dh2020.adho.org/
Series: ADHO (15)