Extending the Utility of the HTRC Extracted Features Dataset Through Linked Data

poster / demo / art installation
  1. 1. Jacob Jett

    University of Illinois, Urbana-Champaign

  2. 2. Boris Capitanu

    University of Illinois, Urbana-Champaign

  3. 3. Deren Kudeki

    University of Illinois, Urbana-Champaign

  4. 4. Timothy W. Cole

    University of Illinois, Urbana-Champaign

  5. 5. J. Stephen Downie

    University of Illinois, Urbana-Champaign

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

A poster describing the latest version of the Extracted Features dataset derived from the HathiTrust Digital Library's 17+ million volume corpus. This version employs Linked Data standards to both, make the dataset more accessible and to incorporate richer metadata describing the volumes from which the data was derived. The dataset is arranged by the volumes and the data (tokens, part of speech tags, language tags, line counts, etc.) is directly associated with the metadata describing the volume in the form individual JSON-LD documents. The EF dataset provides a ready means of interacting with volumes whose intellectual content remains under copyright and allows a variety of analytics, such as visualizing word usage over time, to be carried out on data that would not otherwise be accessible.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO