A Kind of Magic: Migrating a Large Digital Edition of Letters into a New Infrastructure

Elisabeth Steiner; Gunter Vasold; Sanja Saric

Authorship

1. Elisabeth Steiner

Karl-Franzens Universität Graz (University of Graz)
2. Gunter Vasold

Karl-Franzens Universität Graz (University of Graz)
3. Sanja Saric

Karl-Franzens Universität Graz (University of Graz)

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction
The poster will introduce the approach taken to migrate a large-scale digital letter edition with accompanying material to a new technical infrastructure.
The underlying project (Hurch, 2007-) is concerned with the work on the scientific estate of 19
th century linguist Hugo Schuchardt. The primary objective is the edition of the scientific correspondence, an endeavor being underway since decades.

The database comprises nearly 6500 full-text transcribed letters with facsimiles and editorial comments. In addition to a large bibliographical database of primary and secondary literature, a former funding period also produced thesauri for persons, places and subjects. The reason for the migration primarily lies in the increasing difficulty to maintain the proprietary infrastructure, which has been developed and extended for more than two decades.

Levels of Migration

Organizational
The migration process entails moving a large part of the technical infrastructure from the Institute of Linguistics to a second institution of the university specialized in DH and long-term preservation. In future, one institute will be responsible for scholarly data generation and producing the SIP (in the sense of the OAIS model (Space Data Systems, 2012)), whereas the second institution is concerned with the archival and publication (AIP and DIP). Nevertheless, there has to be close cooperation between the two partners to create the necessary organizational and technical interfaces to accomplish this.

Technical
The existing proprietary infrastructure is only partly documented. The project is a pioneer in electronic publishing efforts, thus unfortunately having to deal with a lot of legacy infrastructure and code. The standardized digital objects created during the migration will be integrated into an existing trusted digital repository for long-term preservation operated by the second partner (Centre for Information Modelling, 2003-).
Since the funding for the migration phase is limited to one year, the main focus will be to create an automated workflow for the integration of further data, which requires as low future maintenance as possible.

Content related
The first step of the migration is to create appropriate data models for all resources and to pull the data from the current infrastructure.
The full-text transcriptions are mostly already in TEI format (TEI Consortium, 2018), but not a consistent one. The new schema will also be interoperable with the DTA (Zentrum Sprache, 2011–2018) base format and ediarum (Berlin-Brandenburg Academy of Sciences, 2018). Furthermore, especially thesaurus data is not stored in an acknowledged standard format and has to be transferred to more suitable formats, mostly SKOS (Miles and Bechhofer, 2009).
Due to the high recognition of the project in the community, special attention must be payed to keep existing URLs working and to avoid broken links and confusion on side of the community.
With regard to data reuse, correspondence data will be delivered in CMIF (TEI Correspondence SIG, 2015–2018) to the CorrespSearch (Berlin-Brandenburg Academy of Sciences, 2014–2018) aggregation service; furthermore, the data will be published on a freely accessible OAI endpoint (Lagoze, 2018) via PMH (Lagoze and Sompel, 2015).

Visualization and Analysis
The migration process will also include an update and consolidation of the user interface and dissemination options. A new presentation layer based on Bootstrap 4 (Otto and Thornton, 2011–2018) will be implemented, presenting all data sources from the project accompanied by static content in a fully responsive way. Digital facsimiles will be presented in an IIIF-compliant viewer (IIIF Consortium, 2012–2018).
After finishing the migration of the research data, new visualization and analysis tools like a timeline visualization, a map tool and statistical analyses of the correspondence metadata can be easily integrated using the existing APIs and services of the digital repository. The intent is also to exploit the thesauri to a larger extent in resource discovery and search mechanisms.

Conclusion
The poster will highlight the complexity of migrating a well-known and already large-scale digital edition project to a new infrastructure, touching on every of the aforementioned aspects and showing our solutions. Nevertheless, this challenging process also entails the chance to add extended functionality and dissemination options to the currently rather outdated web representation and further enhance data quality, reuse and interoperability in a way not possible one or two decades ago.
Furthermore, the poster will once more highlight the need for domain specific trusted repositories or data centers (Arbeitsgruppe Datenzentren im Verband DHd, 2017). The case of the project in question illustrates how proprietary standalone infrastructures must fall into disuse because of impossible maintenance work.

Bibliography

Arbeitsgruppe Datenzentren im Verband DHd (2017).
Geisteswissenschaftliche Datenzentren im deutschsprachigen Raum. Grundsatzpapier zur Sicherung der langfristigen Verfügbarkeit von Forschungsdaten. DOI:
http://doi.org/10.5281/zenodo.1134760 (accessed 19 November 2018).

Berlin-Brandenburg Academy of Sciences (2014–2018).
correspSearch. Search scholarly editions of letters.
https://correspsearch.net/index.xql?id=&l=en (accessed 19 November 2018).

Berlin-Brandenburg Academy of Sciences (2018).
ediarum - an Oxygen XML Author framework for digital scholarly editions.
http://www.bbaw.de/telota/software/ediarum (accessed 19 November 2018).

Hurch, B. (ed.) (2007-).
Hugo Schuchardt Archiv. http://schuchardt.uni-graz.at (accessed 8 April 2019).

IIIF Consortium (2012–2018).
International Image Interoperability Framework.
https://iiif.io/ (accessed 19 November 2018).

Lagoze, C. (2018).
Open Archives Initiative.
https://www.openarchives.org/ (accessed 19 November 2018).

Lagoze, C. and Sompel, van de H. (2015).
Open Archives Initiative Protocol for Metadata

Harvesting.
https://www.openarchives.org/OAI/openarchivesprotocol.html (accessed: 19 November 2018).

Miles, A. and Bechhofer, S.
(2009). SKOS Simple Knowledge Organization System Reference. In
W3C
(eds.), URL:
https://www.w3.org/TR/2009/REC-skos-reference-20090818
(accessed 22 November 2018).

Otto, M. and Thornton J. (2011–2018).
Bootstrap.
https://github.com/twbs/bootstrap (accessed 19 November 2018).

TEI Consortium (2018).
TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 3.4.0. http://www.tei-c.org/Vault/P5/3.4.0/doc/tei-p5-doc/en/html (accessed 22 November 2018).

TEI Correspondence SIG (2015–2018).
CMIF – Correspondence Metadata Interchange Format.
https://github.com/TEI-Correspondence-SIG/CMIF (accessed 19 November 2018).

The Consultative Committee for Space Data Systems (2012).
Reference Model for an Open Archival Information System (OAIS).
https://public.ccsds.org/pubs/650x0m2.pdf (accessed 19 November 2018).

Centre for Information Modelling (2003-).
GAMS – Geisteswissenschaftliches Asset Management System.
https://gams.uni-graz.at (accessed 19 November 2018).

Zentrum Sprache der BBAW (2011–2018).
Deutsches Textarchiv-Basisformat.
http://deutschestextarchiv.de/doku/basisformat (accessed 19 November 2018).

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html

References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html

Series: ADHO (14)

Organizers: ADHO

A Kind of Magic: Migrating a Large Digital Edition of Letters into a New Infrastructure

1. Elisabeth Steiner

2. Gunter Vasold

3. Sanja Saric

ADHO - 2019

"Complexities"