Project Twitter Literature: Scraping, Analyzing, and Archiving Twitter Data in Literary Research

Christian Howard-Sukhil

Authorship

1. Christian Howard-Sukhil

Bucknell University

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Project Twitter Literature (TwitLit), seeks to address a growing gap in the literary-historical record by establishing a consistent, rigorous, and ethical method for scraping and cleaning up Twitter data for the use of humanities scholars. In particular, my project explores the growing community of amateur writers who are using Twitter as a means of publication and dissemination for their literary output. There are three parts to my project: the research findings related to the global literary community on Twitter, the tools and resources developed as part of the project and made openly available to other scholars, and partnership with a university library to ensure the long-term preservation of the collected data.The data that I have collected shows that social media is altering literary practices by providing a space for amateur writers to publish, disseminate, and receive feedback from a global community of writers. Preliminary figures put the number of active Anglophone writers using Twitter as a publication platform for their literary output at over 1 million users per year since 2015, and writers working in non-English languages on Twitter raise these numbers even higher. This practice is changing how literature is produced, published, and shared. Readerships too are changing, for rather than being tied to print subscriptions or access to physical books, audiences of social media literature are based on online communities and tied to the costs of physical devices and internet access.My presentation will showcase these research findings in order to highlight the importance and necessity of social media archival work. In so doing, I will discuss how I collected the data using a Python script (co-developed by myself and several other scholars), challenges of cleaning up and visualizing this data (using ArcGIS and tools developed by Documenting the Now), and ethical best-practices for using social media data in research. Information relating to this process – including detailed instructions, the Python scripts used to collect Twitter data, and a list of resources – are free and openly accessible on the project’s website (www.twit-lit.com) and GitHub repository (https://github.com/TwitLit/TwitLitSource). Other scholars are invited to use these scripts and other resources to collect their own social media data.Additionally, my project has attempted to plan for the long-term preservation of the over eight million tweets that I have collected. This preservation has been made difficult by Twitter’s strict Developer Policy and Agreement, which prevents individuals from keeping or disseminating large data sets for more than 30 days. The only exception to this policy is made on behalf of academic institutions, which may store Twitter data for unlimited amounts of time on behalf of academic research. The Project TwitLit project thus presents best-practices for establishing a working relationship with university libraries for storing and disseminating Twitter data in a way that is both in accord with Twitter’s legal restrictions and responsive to the needs of scholars. In short, Project TwitLit provides a case-study of a growing community on Twitter while simultaneously developing a set of tools and guidelines for other scholars seeking to engage in similar work. In December 2017, the Library of Congress, which began archiving Twitter in 2010, announced that it would no longer collect all Tweets; instead, Tweets produced after December 2017 would only be collected on a selective basis. There are no other ongoing, systematic efforts to collect and preserve this digital material. See Library of Congress, “Update on the Twitter Archive at the Library of Congress” (December 2017). For more information related to the challenges of collecting and storing Twitter data, please see Christian Howard, “Studying and Preserving the Global Networks of Twitter Literature,” in Post-45.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020

"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO

Project Twitter Literature: Scraping, Analyzing, and Archiving Twitter Data in Literary Research

1. Christian Howard-Sukhil

ADHO - 2020

"carrefours / intersections"