After deplatforming: digital methods for documenting Twitter and YouTube moderation

paper, specified "short paper"
  1. 1. Emillie Victoria de Keulenaar

    University of Groningen, OILab

  2. 2. Ivan Kisjes

    University of Amsterdam

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Following myriad controversies, including harassment campaigns and dissemination of conspiratorial narratives


et al.

, 2018; Jeong, 2019)

, genocides

(Mozur, 2018)

and political extremism

(Ganesh and Bright, 2020)

, social media platforms and cloud hosts have deployed additional measures to moderate user-generated contents. YouTube, Facebook and Twitter, in particular, have further developed content moderation techniques that prevent the circulation of “problematic information”

(Jack, 2017)

via deletion or “deplatforming”

(Rogers, 2020)

, automatically flagging content as “misleading”

(Gorwa, Binns and Katzenbach, 2020)

, and demoting or “shadow-banning”

(Myers West, 2018)

user posts across search and recommendation results

(Goldman, 2019)

. To date, millions of content associated with hate speech, COVID-19 conspiracy theories and incitement to violence have been wiped out from Twitter

(Al Jazeera, 2021)

, YouTube

(Keulenaar, Burton and Kisjes, 2021)

, Instagram

(Francis, 2021)

, and Facebook

(Lerman, 2020)


Though it is essential for regulating any kind of public sphere, the imperative to delete and otherwise obfuscate problematic content has brought new challenges to new media and digital humanities scholars in at least three ways. Most directly, it renders problematic content unarchivable by default, making almost impossible the study of already precariously archived Web and social media data

(Brügger and Schroeder, 2017)

. Second, moderated platforms provide little information on how they practice moderation. This prevents public efforts from scrutinizing the normative framework platforms adopt to determine what can and cannot be said. In turn, the obfuscation of moderated content and moderation practices further hampers studies on speech moderation and norms as a key historical and societal practices to any modern-day society. Speech norms join a long history of legal, social and political measures to prevent the normalization of problematic histories, and have been implemented in the form of laws, social conventions and civil right campaigns in a range of media types.

Thus far, scholars have relied on a handful of improvised methods for studying moderation and moderated contents. With the exception of Twitter’s Enterprise API

(Twitter, 2021)

, platforms rarely outsource information about what specific contents they have moderated and why. While most studies in platform governance focus on written documentation and leaked documents from ex-platform employees

(Wall Street Journal, 2021)

, digital methods research tends to rely on information provided by Application Program Interfaces (APIs). These researcher practices have yet to be systematically combined, and are still prey to changing APIs

(Perriam, Birkbak and Freeman, 2020)

and reprisals for illegal scraping of information that is otherwise not publicly provided

(Bond, 2021)


On this matter, this paper discusses a new “digital forensics”: a set of methods one can use to reconstruct platform and user traces. In other words, it proposes methods to reconstruct the scene after or on which platform or user data has disappeared in the context of one specific platform effect: content moderation. Drawing from two case studies on Twitter and YouTube’s moderation of COVID-19 misinformation between January and April of 2020

(de Keulenaar

et al.

, Forthcoming)

, and hate speech on YouTube between 2007 and 2018

(de Keulenaar

et al.

, 2021)

, it proposes a web historiography (Brügger, 2013) of content moderation policies and techniques with a combination of HTML scraping, retrieval of moderation metadata via APIs, and detection of content availability through dynamic archival of moderated contents.

It describes the potentials and shortcomings of four methods:

A contextualization of content moderation practices
. Using the Wayback Machine to trace changes in content moderation policies, this method consists in systematically annotating changes to (a) how the platform decides what is and is not problematic; and (b) observing and documenting the techniques the platform uses to moderate contents respectively.

Dynamic archiving of content susceptible to being moderated
. This consists in first developing a taxonomy of problematic speech based on what platform content moderation policies consider to be problematic, such as hate speech (examples being speech that targets gender, racial and other identities), misinformation (such as information that contradicts COVID medical authorities) and “borderline content” (information likely to infringe upon policies in the future, usually described as conspiratorial or fringe discourses). This allows us to design queries that reflect problematic speech, which we use to collect corresponding tweets, videos and comments daily, as well as its corresponding metadata and platform effects (search and recommendation rankings, etc.).

Reverse-engineering content moderation practices
. Using Twitter Academic and YouTube’s standard APIs to reverse-engineer moderation practices, collecting metadata for: (1) the availability of problematic contents per day; (2) by-products of algorithmic moderation, such as their ranking in search and recommendation results over time, and flags and prompts we scrape using Selenium; and (3) user engagement in moderated contents.

Tracing the effects of the disappearance of moderated data
in one platform by looking at its migration in other platforms. This implies looking at how users react to the practice of moderation (for example, what they say about “cancelling”, “deplatforming”, “deleting”, “shadowbanning” and other forms of platform interventions), as well as how they curtail these sanctions by access sanctioned content in alternative or “alt-tech” platforms like Telegram, Bitchute and Parler.

Though imperfect, we aim to demonstrate how this ensemble of methods allows one to contextualize moderation practices, such as deplatforming, algorithmic demotion and flagging, within content moderation policies around hate speech and misinformation. We argue that they allow researchers to surface volatile content moderation practices, as well as map the larger effects of deplatforming across the Web in the fragmentation of users’ information diets across a fringe-to-mainstream social media ecology. Most importantly, we propose this method as a way to systematically document online speech moderation practices and contribute to a history of speech norms across political contexts and media types.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO