RDF-star-based Digital Edition of Travel Journals

paper, specified "short paper"
Authorship
  1. 1. Sepideh Alassi

    DHLab, University of Basel, Switzerland

  2. 2. Lukas Rosenthaler

    DHLab, University of Basel, Switzerland

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Our project aims to develop tools and infrastructure for the creation of interactive web-based digital editions of metadata oriented-documents such as travel journals based on RDF-star and SPARQL-star.

https://www.ontotext.com/knowledgehub/fundamentals/what-is-rdf-star/

Successful digital editions have been created as RDF-based knowledge graphs enabling users to study the editions as a network of interconnected resources. Standard RDF principles are used to define ontologies for modeling metadata and textual information of these editions as RDF triples. The platforms presenting these editions use standard SPARQL for data analysis and query. Standard RDF, however, is not an optimal choice for the digital edition of metadata-oriented documents such as travel journals because most of the information in such documents is accompanied by metadata information describing it, e.g. “person A was at location B” for a certain period of time. Creating statements about statements using standard RDF is troublesome. The very first RDF 1.0 specification uses the mechanism called reification for supporting statements about statements. Reification, however, introduces processing overhead due to the increased number of additional statements needed to identify the reference triple and appears too verbose when represented in RDF and SPARQL (Kasenchak et al. 2021). RDF-star and SPARQL-star overcome this deficit with an extension of the RDF standard and increase the efficiency of queries by reducing the query time. RDF-star allows for triples that represent metadata about another triple by directly using this other triple as its subject or object (Hartig 2017). Using RDF-star, we can easily attach metadata to the edges of the knowledge graph that represents the metadata-oriented document. Our infrastructure will provide tools based on SPARQL-star to efficiently query the data.

The technical basis for our project is Knora,

https://dsp.dasch.swiss/

an infrastructure for humanities data consisting of an RDF-triplestore, an OWL base ontology, and a RestFul API that allows for querying and adding to the data. For our project, this infrastructure will be further developed to support RDF-star and SPARQL-star. As a prototype document to use for developing the ontologies, tools, and the infrastructure, we have chosen Jacob Bernoulli’s travel diary. Jacob Bernoulli (1654–1705) was the first mathematician of the Bernoulli dynasty who, like many in his time, traveled in pursuit of knowledge. He kept a record of his trips in a small journal called Reisbüchlein from August 1676 until October 1683 when he permanently settled in Basel. The entries of this journal contain brief descriptions of places he visited, people he met, travel costs, and the events and phenomena he witnessed during his trips. This so far unresearched document is kept at the archive of University of Basel. Our project aims at creating an open-access RDF-star based edition of this document making every piece of information within it efficiently queryable. Jacob Bernoulli’s scientific notebook Meditationes is currently available on the BEOL

https://beol.dasch.swiss/

platform as an RDF-based digital edition together with the digital edition of correspondence of members of Bernoulli dynasty and Leonhard Euler (Schweizer, Alassi 2018). The digital edition of Reisbüchlein will be integrated into this platform allowing researchers to follow Bernoulli’s line of thoughts from his travel diaries to the scientific ideas written in Meditationes at the same time and his correspondences. Based on this document, we will develop a generic RDF-star-based ontology describing textual data and metadata of travel diaries.

To create a normalized edition of Reisbüchlein whose text is written in old German and French, we have chosen a semi-automatic approach. There is an old unpublished typed transcription of this journal available which we employed to generate digital annotations using Transkribus.

https://readcoop.eu/transkribus/

An editor is currently verifying the automatically generated annotations consulting the digitized facsimiles. At the same time, editorial commentaries regarding the structure of the text, content, and explanation of specific terms are being added to the annotations. Through Knora API, interlinked resources will be created for image regions, their annotations, and commentaries.

We intend to use NLP algorithms to automatically recognize and tag the named entities within the text, such as locations and persons. The tagged entities will then be verified by comparing against the glossary given in the old transcription that lists the places and people mentioned in Reisbüchlein. The algorithm will then find (by querying Wikidata) and add geo-identifiers to locations and GND numbers to persons and will create resources for locations and persons. The tagged elements within the text will be linked to the corresponding resources. This will allow queries for a text that contains a certain person and/or location.

See the proof of concept in “open research data queriable by location” report of Swiss ORD hackathon 2021, in https://docs.google.com/document/d/1lbD6go_​mSNAH3Gmj_​Ao9G​FnGw​PFd4​ZyGo​1HrP_​tYtBs/edit#

Bibliography

Hartig, Olaf. "RDF* and SPARQL*: An Alternative Approach to Annotate Statements in RDF”. International Semantic Web Conference 2017.

Schweizer, T. and Alassi, S. (2018) “Bernoulli-Euler Online: Development of a Platform for Early Modern Mathematical Texts as Part of a Generic Infrastructure”, in
Digital Humanities Congress 2018
. Sheffield: Lana Pitcher and Michael Pidd. Proceedings of the Digital Humanities Congress 2018. Studies in the Digital Humanities, pp. 1–4. Available at:

https://www.dhi.ac.uk/openbook/chapter/dhc2018-schweizer

.

Kasenchak, Bob, Aren Lehnert and Gene Loh, "Use Case: Ontologies and RDF-Star for Knowledge Management".
The Semantic Web: ESWC 2021 Satellite Events, LNCS 12739
, 2021, 254–260. https://doi.org/10.1007/978-3-030-80418-3_38.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO