RISE and SHINE: A Modular and Decentralized Approach for Interoperability between Textual Collections and Digital Research Tools

paper, specified "long paper"
Authorship
  1. 1. Sean Wang

    Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte

  2. 2. Pascal Belouin

    Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte

  3. 3. Hou Ieong Ho

    Staatsbibliothek zu Berlin - Preußischer Kulturbesitz

  4. 4. Shih-Pei Chen

    Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Digital humanities (DH) as a field has been grappling with the significant issue of interoperability. In response, many have proposed that DH needs basic infrastructures behind research projects to ensure its long-term success. In Europe, for instance,
CLARIN and
DARIAH are two such large-scale research infrastructures for humanities. While they have done a tremendous job in centralizing available digital resources, much of their infrastructures remain at the administrative level, and their generic coverage across the entire humanities meant that their utility for specific (and smaller) disciplines and non-European languages is limited. Furthermore, while their focus on open-access resources should be lauded, many textual resources—especially in the Asian context—remain licensed and protected. How can we, as scholars in DH and Asian studies, design a digital research infrastructure fit for our specific needs, taking past experiences with these large-scale infrastructural projects into consideration?

In this paper, we present our technical answers to this question.
RISE stands for
Research Infrastructure for the Study of Eurasia. Formerly known as
Asia Network, it is a pioneering approach for resource dissemination and emerging data analytics (such as text mining and other fair-use but consumptive research techniques) in the humanities, developed by the Max Planck Institute for the History of Science (MPIWG).

We developed
RISE and SHINE to facilitate the secure linkage between third-party research tools to various third-party textual collections (both licensed and open-access ones). Our goal is to revolutionize how scholars can work with textual sources via existing research tools that have already emerged within DH. Although many research tools for analyzing and visualizing texts are already available, it is usually difficult for humanities scholars to download texts from various databases and to prepare them in formats that individual research tools require before analyses. By designing a set of standardized APIs to link texts to digital research tools, we allow scholars to apply digital research tools on texts, regardless of their locations or formats.

When it comes to licensed texts, a common situation for resources in many Asian languages, it is impossible for scholars to use digital research tools to analyze them without illegally downloading or scraping the full texts. By securely linking these licensed texts to digital research tools, we allow scholars to work in a legal manner and ensuring commercial publishers the safety of their collections under a secured virtual research environment. Such flexible, networked approach to e-infrastructure development avoids re-creating silos of resources in the digital realm and allow scholars to fully leverage the potential of material digitization and digital research tools.
To do so, we at MPIWG developed a range of tools:

SHINE is a set of standardized APIs for exchanging textual resources, both open-access ones and protected (or licensed) ones that require authentication and authorization. Inspired by the success of
IIIF, we aim to develop a similar standard for a wide range of communities working with textual resources. Resource providers and research tool developers will find SHINE useful because it supports interoperability among resource repositories and research tools in a decentralized manner. SHINE adopts a generic data model based on metadata from resource provider’s lists of collections and texts. Given a resource ID, SHINE supports API calls to obtain metadata and full-texts in a hierarchical organization. Researchers will find SHINE useful because they will gain unprecedented access to textual resources in a machine-readable format, so that textual resources can be analyzed in research tools in a seamless and legal research workflow.
Here you can find a full list of API endpoints currently implemented by SHINE.

RISE is a middleware that protects resource exchanges via SHINE. It authenticates and authorizes these exchanges, especially for protected (or licensed) resources. It connects with authentication servers from research organizations to authenticate their users and to authorize the access right for users to access certain resources based on their organizational affiliation. It also includes a browser-based administrative interface for resource providers, research tool developers, and research organizations to regulate protocols for specific resource exchanges. It is worth noting that SHINE, as an exchange format, could be adopted independently to facilitate interoperability without RISE.

RISE's architecture overview

To demonstrate the benefits of adopting RISE and/or SHINE and to encourage third-party development of SHINE-compatible technical solutions, a suite of open-access software modules linked to RISE have already been developed and will be made available for others to freely adopt and adapt for their own purposes. They are available on our
GitHub.

RISE-RP is a reference implementation for resource providers to publish their resources via SHINE. By installing RISE-RP, a resource provider (which can be a library, a research institute, a research project, or an individual scholar who manages a text collection) can easily publish its texts via the SHINE API protocol for other scholars to use via SHINE-compatible research tools.

RISE-Catalog is a browser-based user interface that research organizations could customize and implement to facilitate resource discovery and resource-tool linkage for their internal researchers. For example, a research library can customize RISE-Catalog so that it only shows the texts collections that the library has access to. A library user could then browse it and open the selected texts in specific research tools.

RISE-JS-Library is a JavaScript module for research tool developers to enable their research tools’ resource consumption via SHINE. After adopting RISE-JS-Library, a research tool can immediately provide to its users browse-and-search function for all resources in RISE-Catalog within its own user interface.

At the moment, RISE-Catalog has used SHINE to link to several collections of varying licenses, with resources in Chinese, Arabic, Latin, Greek, and English. Open-access collections include Kanseki Repository, Corpus DB, Oxford Text Archive, Perseus Digital Library, and other miscellaneous repositories. Licensed collections include the Chinese Buddhist Electronic Texts, Chinese Text Project, and the Taiwan History Digital Library. It also provides the option for users to directly open selected texts in
MARKUS, a popular research tool for semi-automatic tagging historical Chinese texts. A user in MARKUS could also browse-and-search the RISE-Catalog directly, as MARKUS had implemented RISE-JS-Library in a
beta version (see Figure 2). We are eager for connect to more resources and research tools in order to make RISE and SHINE more useful.

Implementation of RISE-JS-Library within digital research tools

Despite its current name, this suite of products can handle resources in all languages and can be customized to fit existing IT and content management systems. Many projects and infrastructures have proposed similar ideas, creating complex new initiatives that largely centralize resources in digital silos. Yet centralization does not necessarily address interoperability or the challenges with licensed texts (
HathiTrust being a notable exception in this regard). Ours, by comparison, is a modular solution that works, allowing for interoperability between collections and research tools without centralizing resources. Scholarly research and structural design therefore remain intimately connected. This demonstrates the significant returns from our early investment into DH research.

While we promote open access whenever possible, the reality is that many digitized resources in the humanities are still sold by publishers or private vendors. We have had to navigate this complex licensing terrain during our everyday work, and RISE and SHINE is now a prototype primed for a model of decentralized e-infrastructure for humanists. We believe that this approach is a fruitful one that would provide a sustainable and interoperable research environment for humanist scholars.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.