University of Illinois, Urbana-Champaign
When to Ask for Help: Evaluating Projects for Crowdsourcing
Organisciak, Peter, University of Illinois, United States of America, email@example.com
A growing online phenomenon is that of crowdsourcing, where groups of disparate people, connected through technology, contribute to a common product. It refers to the collaborative possibilities of a communications medium as flexible and as populated as the Internet. If many hands make light work, crowdsourcing websites show how light the work can be, breaking tasks into hundreds of pieces for hundreds of hands. Building from the growing body of research in the area including the author’s work on crowd motivations, this paper outlines the necessary steps and considerations in enriching projects through crowdsourcing.
Though not new, crowdsourcing as it exists online has been enabled by emerging technologies. It has grown out of increasingly efficient – and affordable – forms of communication. Since such collaboration has expanded so quickly, there have been few investigations into the design of crowdsourcing. At the same time, the most successful projects have emerged in an organic nature that many deliberate attempts have failed to replicate, suggesting the need for more investigation in the area. Jeff Howe, who first defined the term and popularized the trend, has explained that “we know crowdsourcing exists because we've observed it in the wild. However, it's proven difficult to breed in captivity” (2008).
The gaps in knowledge of online crowds are quickly being filled however, allowing projects to move away from reliance on serendipity. This presentation derives from recently completed thesis work on the motivations of crowds within crowdsourcing (Organisciak 2010). While it will reflect that study’s findings on how, its primary focus is on the equally important questions of why and when in light of those findings. For which tasks is crowdsourcing an appealing option and what resources should be present for a project to adequately motivate the users? A bottom-up classification of crowdsourcing categories is proposed, followed by a checklist of needs that an institution must consider before attempting their own crowdsourcing.
In this study, a sample of 300 crowdsourcing sites was examined and classified. Synthesizing these classifications resulted in a proposed list of eleven non-exclusive categories for crowdsourcing, six describing method and five describing structure. Methods include encoding, creation, idea exchange, skills aggregation, knowledge aggregation, and opinion aggregation. Additionally, there are financial, platform, gaming, group empowerment, and ludic structures observed within these systems. Derived from existing systems, these categories and their variants offer unique design patterns and best practice cases that can assist in assessing the types of tasks at which they excel.
Appropriateness of the task is just one facet of running a crowdsourcing project. The other consideration is whether a project offers a return that potential participants would find rewarding. In addressing this, a content analysis was used to identify site design mechanics related to user experience in thirteen cases spanning the breadth of the identified categories. These mechanics were then discussed in a series of user interviews to determine what users truly care about. In this study, a collection of primary and secondary motivators are proposed as foundational considerations in running a project. The primary motivators seen in the user interviews were interest in the topic, ease of entry and of participation, altruism and meaningful contribution, sincerity, and appeal to knowledge. A final one, financial incentive, is perhaps the most blunt. Secondary motivators include indicators of progress and reputation (i.e. “cred”), utility, fun, system feedback, social networking, and fixed windows (i.e. well-groomed quality).
An understanding of the nature of crowdsourcing holds notable benefits to scholarship in the humanities and social sciences. Most significantly, this is because it allows large-scale insights into the qualitative and the abstract, those areas inextricably linked to the limits of manpower, unable to be delegated to computing power. “What is the sentiment of this sentence”, is the type of question a crowdsourcing site may ask (Mechanical Turk, May 2nd 2010), if not always as directly. Since much work in the arts cannot easily be quantified, logistics and resources often limit humanities research to a balance between breadth and depth; crowdsourcing offers an escape from this issue.
Consider one task that is often seen in existing crowdsourcing sites: crowd-encoded classification. Classification tasks are dependent on the person-hours available, because person-hours are the only dependable way to approach these tasks. Whether directly or incidentally, online crowds can effectively encode or classify content. Though the reliability of the end product is often far below that of a professional encoder, large-scale crowd projects can often account for this through multiple independent classifications, measuring consistency and reliability through agreement. Galaxy Zoo, an effort from Oxford to classify galaxies, found crowdsourced data to be within 10% agreement with the same data classified professionally (Lintott et al. 2009). The high quality of work is especially notable because the experiment and its follow-ups received their 60 millionth classification in April 2010.
Flickr Commons, an initiative to put photo archives on a photo-sharing community, is a similar project that – by way of community-based research, information and tagging – has enriched the metadata of hundreds of Library of Congress photographs in the United States of America (Springer, et al. 2008). Another pilot project involving public tagging, by the National Library of Australia, concluded that “tagging is a good thing, users want it, and it adds more information to data. It costs little to nothing and is relatively easy to implement; therefore, more libraries and archives should just implement it across their entire collections” (Holley 2010). The National Library of Australia followed through on this recommendation.
Such projects are often greeted with suspicion in professional or scholarly communities. The National Library of Australia report notes that "institutions who have not implemented user tagging generally perceive many potential problems that institutions who have implemented user tagging do not report" (Clayton et al. 2008 qtd. in Holley 2010). The Library of Congress report similarly notes many concerns that critics provided, such as: “Would fan mail, false memories, fake facts, and uncivil discourse obscure knowledge? … Would the Library lose control of its collections? Would library catalogs and catalogers become obsolete?...Would history be dumbed-down? Would photographs be disrespected or exploited?” (Springer et al. 2008). In both cases, the reports state that the concerns, within the respective project’s experiences, have not manifested.
Encoding is a notable use of crowdsourcing in academia, but not the only one. Some projects, such as the Suda On Line, benefit from collected contributions of expertise and knowledge. Suda On Line is a project to translate a Byzantine encyclopedia, Suda, into English for the first time. It has been steadily progressing since 1998, producing a comprehensive resource while staying at a manageable participation scale (Mahoney 2009). In other cases, crowdsourcing allows public and volunteer projects to compete with the scale and quality of commercial projects, as has been seen in OpenStreetMap, Project Gutenberg, and many open source projects.
As crowdsourcing continues to be tested – and if it continues to be successful – in public institutions, understanding how to undertake such projects will become more important. The benefits are being stated, and the scale and openness on which public institutions operate makes them a compatible beneficiary of crowdsourcing activities. Users appear especially altruistic toward public projects, emphasizing in this study their preference for meaningful engagement with institutional workings over symbolic outreach.
The study informing this work is large, and my hope is to provide a digestible account of its results. The reason for this goal is straightforward: there is still much work to be done in understanding the mechanics of crowdsourcing, but the potential is great. I hope that the sharing of this foundational work will encourage others to explore further.
This study owes a great debt to Lisa M Given, my thesis advisor, as well as additional committee members Geoffrey Rockwell and Stan Ruecker.
Holly, Rosé “Tagging Full Text Searchable Articles: An Overview of Social Tagging Activity in Historic Australian Newspapers August 2008 — August 2009, ” D-Lib, 12(1/2) 2010 (link) January 31, 2010
Howe, Jeff Crowdsourcing 2008
Lintott, Chris, et al “Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey, ” arXiv, 0804.4483 2010 (link)
Organisciak, Piotr Why bother? Examining the motivations of users in large-scale crowd-powered online initiatives, 2010 (link)
Springer, Michelle, et. al. For the Common Good: The Library of Congress Flickr Pilot Project, 2008
Mahoney, Anne “Tachypaedia Byzantina: The Suda On Line as Collaborative Encyclopedia, ” Digital Humanities Quarterly, 3.1 2009
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Stanford University
Stanford, California, United States
June 19, 2011 - June 22, 2011
151 works by 361 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)
Conference website: https://dh2011.stanford.edu/
Series: ADHO (6)