Commercial crowdsourcing in digital humanities: prospects and ethical issues

paper, specified "short paper"
  1. 1. Rosa Suviranta

    University of Helsinki

  2. 2. Tuomo Hiippala

    University of Helsinki

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This presentation discusses key issues in using commercial crowdsourcing in digital humanities. Traditionally, digital humanities have engaged volunteers for tasks like digitising and organising information (Dunn and Hedges, 2013; Carletti et al., 2013). However, not all fields in digital humanities can benefit from volunteer-based crowdsourcing. I argue that commercial crowdsourcing is a viable alternative for fields that cannot attract volunteers, provided that crowdsourcing is used in an ethically sustainable way. To do so, I propose solutions to a range of ethical issues related to fair pay and hidden labour on commercial crowdsourcing platforms. I also discuss linguistic and epistemic asymmetries between task requesters and the global crowdsourced workers and argue for the need to develop crowdsourcing methods that balance the needs of both ethics and data quality. To this end, I draw on examples from an ongoing project that uses crowdsourcing to create multimodal corpora and show how a combination of pedagogically motivated training, paid exams and multimodal instructions can mitigate these issues.
Crowdsourcing is a participatory method in which an individual, an institution, organisation or company can request an undefined group of workers with varying knowledge and number to perform a task through an open call (Carletti et al., 2013: 1–2). Crowdsourcing can take place on commercial and non-commercial platforms, and the tasks can range from data labelling to content creation (Dunn and Hedges, 2013). How crowdsourcing is understood depends on the field of research. Computer vision, for example, uses crowdsourcing to create training data for algorithms by decomposing complex tasks into piece-meal work and distributing this effort among paid non-expert workers on online platforms (Kovashka et al., 2016).
In digital humanities, crowdsourcing is often associated with the galleries, libraries, archives and museums (GLAM) domain. In this context, crowdsourcing is a way of engaging enthusiasts with intrinsic incentives to perform tasks for free and for the 'common good' (Daugavetis, 2021). Consequently, the ethos among digital humanities is to use crowdsourcing to engage volunteers to interact, explore and contribute to the research at hand (Dunn and Hedges, 2013; Terras, 2015).
However, not all fields under the umbrella of digital humanities can benefit from volunteer-based crowdsourcing. One such example is the emerging discipline of multimodality research, which studies the way human communication relies on intentional combinations of expressive resources. The discipline is currently undergoing a shift toward a more data-driven direction due to increased calls for validating theories of multimodality through empirical research (Pflaeging et al., 2021). This shift has also brought multimodality research in contact with digital humanities, especially within the paradigm of 'distant viewing' (Arnold & Tilton, 2019), which applies computational methods to large-scale analysis of visual materials (Hiippala and Bateman, 2021). Together with computational methods, commercial crowdsourcing has been identified as a potential way of increasing the size of corpora studied in multimodality research (Hiippala et al., 2019).
However, any use of commercial crowdsourcing must acknowledge the ethical issues and pitfalls related to crowdsourcing platforms. As a part of the novel platform economy, crowdsourcing lacks regulation which enables exploitative practices (Schmidt, 2017). Labour rights are largely absent, and the pay is usually far from a living wage, and there are no rules or stipulations for a minimum wage. Moreover, new workers often need to perform several months of non- or low-paid work in the form of qualification labour to access well-paid tasks (Kummerfeld, 2021). Qualification, or ranking, is a social reward technique to increase standing and reliability in the crowdsourcing community by completing tasks successfully (Dunn and Hedges, 2013: 152-154), which the requesters often use as a quality control tool to filter out low-performing workers (Kummerfeld, 2021: 343). Many workers also speak English – the lingua franca of crowdsourcing platforms – as a foreign language, which can lead to misunderstandings, rejected work and payment refusal.
Although the platforms enable exploitative practices, requesters can influence the conditions and for crowdsourced work. Firstly, the workers must be compensated appropriately and paid at least a minimum wage. Although ethically-sustainable crowdsourcing is not cheap, it is considerably cheaper than using experts (Hiippala et al., 2021: 673). Secondly, requesters can recruit workers with fewer qualifications, while maintaining quality by combining pedagogically-motivated training, paid exams and multimodal instructions. Pedagogically-motivated training allows the workers to learn the task through trial and error. Subsequently, paid exams filter the workers to perform the actual task. Pairing the training with the paid exam for selecting the workers ensures that even if a worker fails the exam, they are compensated for their effort. Finally, multimodal instructions, which combine text and illustrations, can support workers with limited language skills.


Arnold, T. and Tilton, L. (2019). Distant viewing: analyzing large visual corpora.
Digital Scholarship in the Humanities, 34(supplement 1): i3–i16.

Carletti, L., Giannachi, G., Price, D., McAuley, D. and Benford, S. (2013). Digital humanities and crowdsourcing: An exploration.
Museums and the Web. Available at: (Accessed: 21.4.2022).

Daugavietis, J.
(2021). Motivation to engage in crowdsourcing: Towards the synthetic psychological–sociological model.
Digital Scholarship in the Humanities

(4): 858-870.

Dunn, S. & Hedges, M. (2013). Crowd-sourcing as a component of humanities research infrastructures.
International Journal of Humanities and Arts Computing, 7(1-2): 147–169.

Hiippala, T., Alikhani, M., Haverinen, J., Kalliokoski, T., Logacheva, E., Orekhova, S., Tuomainen, A., Stone, M. and Bateman, J. A. (2021). AI2D-RST: A multimodal corpus of 1000 primary school science diagrams.
Language Resources and Evaluation, 55(3): 661–688.

Hiippala, T. and Bateman, J. A. (2021). Semiotically-grounded distant view of diagrams: insights from two multimodal corpora.
Digital Scholarship in the Humanities. Available at: 10.1093/llc/fqab063 (Accessed: 21.4.2022).

Kovashka, A., Russakovsky, O., Fei-Fei, L. and Grauman, K. (2016). Crowdsourcing in computer vision.
Foundations and Trends in Computer Graphics and Vision, 10(3): 177–243.

Kummerfeld, J. K. (2021). Quantifying and avoiding unfair qualification labour in
In ‘Proceedings of the 59th Annual Meeting of the Association
for Computational Linguistics and the 11th International Joint Conference on
Natural Language Processing (Volume 2: Short Papers),
Association for Computational Linguistics, Online, pp. 343–349. Available at:
(Accessed: 21.4.2022).

Pflaeging, J., Wildfeuer, J. and Bateman, J. A. (eds.) (2021).
Empirical Multimodality
Research: Methods, Applications, Implications
. Berlin and Boston: De Gruyter.

Schmidt, F. A. (2017). Digital labour markets in the platform economy.
Mapping the Political Challenges of Crowd Work and Gig Work, 7,

Terras, M. (2015). Crowdsourcing in the digital humanities. In S. Schreibman,
R. Siemens & J. Unsworth. (eds),
A New Companion to Digital Humanities.
Wiley, pp. 420–438.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO