Inferring book relationships at the trillion-word scale

paper, specified "short paper"
  1. 1. Peter Organisciak

    University of Denver

  2. 2. Benjamin M. Schmidt

    New York University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Large digital libraries like the HathiTrust Digital Library (HTDL) provide texts of historical, cultural, or literary significance at unprecedented scales. However, the size and the consortial approach to building them can confuse computational attempts to model the collection, due to issues such as uneven duplication and incomplete metadata. This paper presents the technical workflow of a project seeking to address those challenges.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO