OEAW Österreichische Akademie der Wissenschaften / Austrian Academy of Sciences
OEAW Österreichische Akademie der Wissenschaften / Austrian Academy of Sciences
Austrian Academy of Sciences, Austria; University of Graz, Austria
The International Prosopography Interchange Format
This presentation intends to describe ongoing work at the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH) on the International Prosopographical Interchange Format (IPIF). It presents IPIF’s design, and explores various conceptual and technical challenges arising from its implementation.
IPIF is an API definition and data model enabling the sharing and querying of prosopographical data. The original IPIF paper (Vogeler et al. 2019) recognises the power of semantic web tools (RDF, OWL, SPARQL), but also highlights their shortcomings for an interoperable format, chiefly the absence of a standard data model and the complexity of SPARQL. (See Bradley 2020.) Such difficulties circumscribe uses such as ‘light-weight’ data access and querying (as opposed to complex data processing). Accordingly, IPIF opts for a simple RESTful API.
In addition to a reference implementation, Papilotte (Vasold 2019), IPIF is intended to be implemented on top of existing projects, allowing access to data in a standardised format and providing the facilities for federated queries and as a data discovery tool. (Vogeler et al. 2019)
The Data Model and API
To achieve maximum interoperability, IPIF adopts a version of the ‘factoid model’ (Bradley 2005). This model separates
statements made about a person from the abstract idea of a
person, instead attributing statements to a researcher’s interpretation of a historical
source (Figure 1). This enables contradictory statements, made by distinct sources, to be recorded. As Baillie (2021) argues, this is not always desirable, some historical sources being more trustworthy than others; nevertheless, it is a requirement for an open ecosystem without any single ‘guiding hand’ that contradiction be permitted.
FIGURE 1: THE IPIF DATA MODEL
The API — described in OpenAPI — is a RESTful interface, allowing direct access to the four IPIF entity types (Factoid, Source, Person, Statement), and querying of statements through graph-like traversals (Figure 2). It returns JSON-LD for each entity, comprising content and/or metadata, and embeds references to related entities.
The choice of Statement fields represents a highly pragmatic decision. These fields, allowing a label and a URI, were chosen to match the most obvious use-cases in prosopographical data; the
statementType field allows arbitrary extension beyond these standard statement types.
FIGURE 2: API EXAMPLES
Implementations, issues and works-in-progress
Since the original definition of the standard, IPIF endpoints have been implemented on top of several existing frameworks, including the ACDH-CH’s APIS platform (a Django- based platform for prosopographical projects) and as an eXist-DB module for serving TEI- XML personography data (typically from digital scholarly editions). IPIF client libraries for Python and JavaScript, allowing federated queries and data aggregation across several endpoints, have also been developed. Building these tools has afforded a wealth of practical experience, highlighting the strengths of the format and the difficulties involved in its implementation. This has led to several pragmatic decisions regarding the data model, API and the semantics of querying.
A
label field was introduced for Persons, allowing use cases such as autocompletes. The theoretically correct modelling — as a ‘naming statement’ — required too many additional requests to retrieve the appropriate information.
Persons and Sources allow multiple URIs (interpreted as owl:sameAs). Strictly, these should be Statements (i.e. non-definitive assertion that one Person or Source is the same as another: see Zedlitz 2009); but reconciling data from multiple endpoints is considerably easier when this information is a ‘meta’ field of a Person entity.
IPIF requires a Source for each Factoid. In many projects, data that would comprise an IPIF Statement is given with no source (e.g. name, death, profession of a person”). In this case, the source is taken to be the project itself.
TEI-XML personographies can express a factoid model by applying the @source and @resp attributes to the sub-elements of a entry (see Schwarz 2020),
but this is optional. Pragmatically, we suggest using the metadata of the TEI document as fallback (@source, @resp, ancestor::TEI/teiHeader/ titleStmt/editor etc.).
To avoid ambiguity in combining statement filters in Person queries (does person/?place=Graz&role;=professor mean “a professor in Graz” or “a professor, located in Graz for any reason”?), we defined a default behaviour (both conditions apply to the
same Statement) and an optional parameter independentStatements=true to allow ‘or’ conditions.
In this presentation, we will argue that such decisions are justified by the requirements of interoperability; and that our experiences in developing IPIF thus far contribute usefully to debates surrounded data interoperability in the digital humanities.
Bibliography
Baillie, James. “Alternative Database Structures for Prosopographical Research”.
International Journal of Humanities and Arts Computing 15, Nr. 1–2 (October 2021): 117– 32.
https://doi.org/10.3366/ijhac.2021.0265.
Bradley, John Douglas. “A Prosopography as Linked Open Data: Some Implications from DPRR”.
Digital Humanities Quarterly 014, Nr. 2 (29 July 2020).
http://digitalhumanities.org/ dhq/vol/14/2/000475/000475.html.
Brradley, John, and Harold Short. “Texts into Databases: The Evolving Field of New-Style Prosopography”.
Literary and Linguistic Computing 20, Nr. Suppl (1 January 2005): 3–24.
https://doi.org/10.1093/llc/fqi022.
Pasin, Michele, and John Bradley. “Factoid-Based Prosopography and Computer Ontologies: Towards an Integrated Approach”.
Literary and Linguistic Computing 30, Nr. 1 (1 April 2015): 86–97.
https://doi.org/10.1093/llc/fqt037.
Vogeler, Georg, Matthias Schlögl, and Gunter Vasold. “Data Exchange in Practice: Towards a Prosopographical API (Preprint)”. Paper given at BD2019 in co-location with RANLP 2019 in Varna (2019).
https://doi.org/10.17613/YW4H-5F09.
Schwartz, Daniel L, Nathan P Gibson, and Katayoun Torabi. “Modeling a Born-Digital Factoid Prosopography Using the TEI and Linked Data”.
Journal of the Text Encoding Initiative, 2020, 37.
Vasold, Gunter: Papilotte. A flexible and extensible IPIF server.
https://github.com/ gvasold/papilotte (2019)
Zedlitz, Jasper. “Gedbas4all – New Data Model for Genealogy”. GenWiki, 2009.
http:// wiki-en.genealogy.net/Gedbas4all/Article = English version of Zedlitz, Jasper. „Gedbas4all – neues Datenmodell für die Genealogie“.
Computergenealogie, Nr. 04 (2009).
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Tokyo, Japan
July 25, 2022 - July 29, 2022
361 works by 945 authors indexed
Held in Tokyo and remote (hybrid) on account of COVID-19
Conference website: https://dh2022.adho.org/
Contributors: Scott B. Weingart, James Cummings
Series: ADHO (16)
Organizers: ADHO