Centering the Marginalized: Scholar-Curated Worksets from the HathiTrust Digital Library

poster / demo / art installation
  1. 1. Isabella Magni

    HathiTrust Research Center, United States of America

  2. 2. Glen C. Worthey

    HathiTrust Research Center, United States of America

  3. 3. Maryemma Graham

    HathiTrust Research Center, United States of America

  4. 4. John A. Walsh

    HathiTrust Research Center, United States of America

  5. 5. J. Stephen Downie

    HathiTrust Research Center, United States of America

  6. 6. Ryan C. Dubnicek

    HathiTrust Research Center, United States of America

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) project, generously supported by the Andrew W.Mellon Foundation, is producing a suite of scholar-curated, targeted worksets of materials from the HathiTrust Digital Library, facilitated by its Research Center (HTRC). HTRC worksets are user-created collections of HathiTrust volumes that can be treated as data and analyzed using a variety of tools. Worksets can be shared and cited, contributing to research reproducibility and durable scholarship. In addition to their value as focused digital collections, SCWAReD’s scholar-curated worksets also serve as illustrative,
reusable research models, and include not only the worksets themselves, but also scholarly introductions, derived datasets and related documentation, and research reports, demonstrating the collaborative workset-building, textual analysis, workflow development, and dataset creation activities typically carried out by HTRC.  

The special mission of SCWAReD is to highlight and center the work of historically under-resourced and marginalized textual communities. For this purpose, a flagship project and four sub-projects were selected competitively; each of them explores new methods for creating, analyzing, and reusing curated digital collections and the research data derived from them. The need to address inequities in both library collections and digital humanities research is already well documented (e.g. among others: Gallon, 2016; McPherson, 2012; Earhart, 2012). SCWAReD aims to help address these inequities in both library collections and digital research by identifying and remediating gaps within HathiTrust, and by using computationally-assisted efforts to recover content that is already part of the HathiTrust Digital Library but may be difficult to discover with traditional metadata, in a traditional catalog, from within a massive digital collection. 
SCWAReD’s flagship collaboration is with the Black Books Interactive Project, part of the longstanding History of Black Writing (HBW), founded in 1983 at the University of Mississippi by SCWAReD Co-PI Maryemma Graham and hosted since 1998 under her leadership at the University of Kansas. 
Four more projects were selected to create curated worksets to be developed concurrently: 

“Mining the Native American Authored Works in HathiTrust for Insights,” in which directors Kun Lu, Raina Heaton, and Raymond Orr (University of Oklahoma) seek to develop a database of Native American authors and their bibliographic information, create a reusable workset of Native American authored works in HathiTrust, and provide insights into the characteristics of the community by text mining their works; 
“The Black Fantastic: Curated Vocabularies, Artifact Analysis and Identification,” in which directors Clarissa West-White (Bethune Cookman University) and Seretha Williams (Augusta University) propose to prove that characteristics of the Black Fantastic—the cultural production of African Diasporic artists and creators who engage with the intersections of race and technology in their work—exist in historical and current cultural artifacts, including those created by and about future-forward personalities, such as Dr.Mary McLeod Bethune; 
“Creating Period-Specific Worksets for Latin American Fiction,” in which director José Eduardo González (University of Nebraska, Lincoln) seeks to create datasets to research the history of Latin American fiction and question traditional periodization of this literature by attempting to detect the boundaries between literary periods and subgenre distinctions; and 
“The National Negro Health Digital Project: Recovering and Restoring a Black Public Health Corpus,” in which director Kim Gallon (Purdue University) draws on HathiTrust’s collection of public health documents on Black health to explore how early twentieth century Black public health officials communicated and addressed health disparities that impacted African American communities. 

For each of these projects, we identify and attempt to fill collection gaps (items documented by scholar-curators, but missing from HathiTrust). We also create, collect, and document our research artifacts (elements of a “reusable research model,” as described above), and include them with the curated workset. These include the search algorithms devised for the survey of existing holdings; data derived from the workset objects; bibliographies and bibliographic essays; curatorial statements; and whatever other apparatus and artifacts may be deemed significant for interpreting and analyzing the workset, or amenable for later reuse, all of which will be released open access. In each of these partnerships, project teams bring content and domain expertise, research questions, and curation experience, while HTRC provides HathiTrust collection access, research tools and environments, and technical expertise in text and data mining. Research questions suitable for interrogation in HathiTrust holdings have been developed in the course of each project, informed by the workset building process, available content, and gaps identified.
Our poster will provide an overview of the SCWAReD project, our flagship collaboration with the Black Books Interactive Project, and our four collaborative projects. We will also provide preliminary results and report on gap-filling efforts.


Earhart, A.
(2012). Can Information Be Unfettered? Race and the New Digital Humanities Canon. In Gold, M. ed.
Debates in the Digital Humanities
. Minneapolis: University of Minnesota Press, chapter 18,

(accessed 27 April 2022).

Gallon, K. (2016).
Making a Case for the Black Digital Humanities. In Matthew Gold, ed.
Debates in the Digital Humanities 2016
. Minneapolis: University of Minnesota Press, chapter 4,

(accessed 27 April 2022).

McPherson, T.
(2012). Why Are the Digital Humanities So White? In Gold, M. ed.
Debates in the Digital Humanities
. Minneapolis: University of Minnesota Press, chapter 9,

(accessed 27 April 2022).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO