Living apart together: Research across Repositories

Thomas Kollatz; Max Grüntgens

Authorship

1. Thomas Kollatz

Akademie der Wissenschaften und der Literatur (ADW) Mainz
2. Max Grüntgens

Akademie der Wissenschaften und der Literatur (ADW) Mainz

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Nowadays most Digital Editions are created or at least distributed in TEI-XML as scholarly research data. In epigraphy
EpiDoc: TEI XML for epigraphic documents is the subset of choice.

While consistency within an encoded corpus or collection of sources is enforced via schema files, experience has shown, that different scholars for sensible scholarly reasons decide to encode similar phenomena slightly differently, moving back and forth between a more flat structured as well as data centered approach on the one hand and a more narrative as well as verbose approach on the other hand.
RDF as high level data model enables researchers to make statements about scholarly resources as well as their constitutive parts and the relations among them. These RDF statements themselve are formulated in a way abstract enough to transcend the individualized usages essential to TEI XML encoded resources. Thus RDF as a bridge technology provides the means to mirror more individualistically encoded TEI XML resources onto an abstract metadata layer of more easily processable data points (see Brodhun, de la Iglesia, Moretton (2015), Grüntgens, Schrade (2016), 61–62); also compare Ciotti, Tomasi (2016), Ciotti (2018)).
Two challenges are associated with RDF from a Humanities point of view:
Firstly, statement extraction may degenerate into complex data wrangling tasks done once more by every single scholar. A solution for this may be provided by web services like Torsten Schrade’s XTriples that centralize functionality and provide a relatively low-threshold approach: triple extraction via XPath and triple building via a basic XML configuration file.
Secondly, most RDF ontologies are either seen as being to complicated for low-threshold statement building for individual scholarly activity in the Humanities or as capturing the scholarly domain in question not adequately enough. This challenge may be remedied by advising scholars to first build “flat” statements for explorative in-house research. These basic statement patterns, however, have to be none the less formalized and operationalized in regard to their scholarly approach. Thus, after constructing a prototype, the scholars may subsequently lift the prototypical extraction on to a distributable level by means of inclusion of complex ontologies (see Eide, Ore (2019), 194–195).
The poster will demonstrate a workflow for easy prototyping of cross-corpus research questions by means of RDF-Lifting from TEI XML via XTriples. The material used comes from two epigraphical collections: DIO and EPIDAT (see Grüntgens, Kollatz (2018), Kollatz (2018)).
Both repositories make medieval inscriptions of the German City of Worms available. DIO provides Christian inscriptions in German and Latin, EPIDAT provides inscriptions from the Jewish cemetery “Im Heiligen Sand”, one of the oldest Jewish cemeteries still in existence in Europe. Both repositories provide
EpiDoc: TEI XML for epigraphic Documents. Therefore XML can serve as a starting point for RDF-Lifting with the XTripels webservice.

As a minimal prototype, we want to approach gender distributions in both collections. Information about persons is contained in the data of both repositories (figure1).

Figure 1. Gender attribution implicitely given in the sex-attribute within the person-element; full record: http://www.steinheim-institut.de/cgi-bin/epidat?id=wrm-1253-teip5 (1=male, 2=female)
The transformation of the “implicit” semantics of XML given into the “explicit” semantics of RDF is done by means of the XTripels webservice (see Grüntgens, Schrade (2016), 54, 56). XTripels has proven itself to be an effiecient and easy-to-use tool to crawl XML-resources and extract RDF-statements. The configuration is based on XPath expressions (figure2).

Figure 2. Configuration of the XTripels webservice
As soon as the data of both repositories are available as RDF triples, the RDF data as a whole is stored in a triplestore, and queried via SPARQL (figure 3).

Figure 3. SPARQL-Query
The result of the above simple SPARQL-query shows that gender distribution in both repositories differ significantly: The proportion of inscriptions about men and women are equally balanced in EPIDAT, whereas in DIO inscriptions about women are outweight by inscriptions about men. (figure 4).

Figure 4. Simple gender distribution in DIO (above) and EPIDAT (below). Visualization realised with RawGraphs
http://app.rawgraphs.io 0=unknown; 1=male; 2=female; 9=not defined

Repositories
EPIDAT – research platform for Jewish epigraphy:

DIO – Deutsche Inschriften Online | German Inscriptions Online

http://www.inschriften.net

Bibliography

Brodhun, M., de la Iglesia, M., Moretto, N.
(2015). Metadaten, LOD und der Mehrwert standardisierter und vernetzter Daten. In Neuroth, H. et al. (eds),
TextGrid: Von der Community – für die Community
, Göttingen: Universitätsverlag Göttingen, pp. 91–102,

Ciotti, F.
(2018). A Formal Ontology for the Text Encoding Initiative.
Umanistica Digitale
, 3,

Ciotti F. and Tomasi, F.
(2016). Formal Ontologies, Linked Data, and TEI Semantics.
Journal of the Text Encoding Initiative
,
9,

Eide, Ø., Ore, C.-E. (2019). 8. Ontologies and Data Modeling. In Flanders, J., Jannidis, F. (eds),
The Shape of Data in Digital Humanities Modeling Texts and Text-based Resources. London, pp. 178–196.

EpiDoc.
Epigraphic Documents in TEI XML,

Grüntgens, M., Kollatz, T. (2018). Korpusbasiertes Arbeiten und epigraphische Datenbanken: Möglichkeiten und Herausforderungen am Beispiel von EPIDAT und DIO. In Gessinger, J. et al. (Eds.)
Osnabrücker Beiträge zur Sprachtheorie (OBST), 92: pp. 157–174.

Grüntgens, M., Schrade, T.
(2016). Data repositories in the Humanities and the Semantic Web: modelling, linking, visualising. In Alessandro, A. et al. (eds),
Proceedings of the 1st Workshop on Humanities in the Semantic Web co-located with 13th ESWC Conference 2016 (ESWC 2016),
Anissaras, pp. 53–64 URL:

Kollatz T.
(2018). 18 EPIDAT – Research Platform for Jewish Epigraphy. In Annamaria De Santis, Irene Rossi (Eds.),
Crossing Experiences in Digital Epigraphy: From Practice to Discipline

(pp. 231–239). Warsaw, Poland: De Gruyter.

Schrade, T.
(2016).
XTriples. A generic webservice for RDF lifting from XML resources,

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html

References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html

Series: ADHO (14)

Organizers: ADHO

Living apart together: Research across Repositories

1. Thomas Kollatz

2. Max Grüntgens

ADHO - 2019

"Complexities"