Towards Multilingualism In Digital Humanities: Achievements, Failures And Good Practices In DH Projects With Non-latin Scripts

workshop / tutorial
Authorship
  1. 1. Martin Lee

    University Library - Freie Universität Berlin (FU Berlin)

  2. 2. Cosima Wagner

    University Library - Freie Universität Berlin (FU Berlin)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Exposé
The one day workshop responds to the call for multilingualism and multiculturalism in the Digital Humanities (DH) and discusses achievements, failures and good practices in DH projects with non-latin scripts (NLS). We want to provide hands-on insight into Do’s and Dont’s in NLS context and identify possible transferable practices to other languages and disciplines in the sessions, building upon lessons learned in the workshop “NLS in multilingual (software) environments”
1 held in 2018 at Freie Universität Berlin. The main goal was and is to strengthen an international network of NLS practitioners and experts who develop, maintain and distribute specific NLS knowledge, regardless of their working affiliation in academia, libraries, museums, or elsewhere.

While Digital Humanities scholarship is a vibrant and growing field of research in many humanities and social sciences disciplines, it has also been criticized as being culturally and technologically biased (
e.g. Fiormonte, 2012; Mahony, 2018). As a result, there is a lack of DH infrastructure suited for processing non-latin scripts. This is not only a cultural problem of the representation of DH research from non-anglophone countries but also for area studies disciplines within the so called “Western” academic world.

In their call for papers to the DH Asia conference 2018 at Stanford University the organisers stated that
"when we look at DH in Western Europe and the Americas, we find a vibrant intellectual environment in which even college and university undergraduates – let alone more advanced researchers – can download off-the-shelf analytical platforms and data corpora, and venture into new and cutting-edge research questions; while, in the context of Asian Studies, we find an environment in which many of the most basic elements of DH research remain underdeveloped or non-existent” (Mullaney, 2017).
This is not only true for Asian Studies but also for academic disciplines like Egyptology, Arabic Studies, Jewish Studies and other disciplines conducting DH research in or with non-latin scripts.
While there have been recent activities to strengthen the collaboration and networking within the NLS-DH community
2 there is still a strong need for knowledge exchange on NLS suited DH tools, best practices and networking events.

For example, how can we raise the awareness for NLS specific aspects in dominant standardization committees like the Unicode Consortium (especially for scripts like hieratic signs which are not yet represented) or in (inter)national authority files (Getty Thesauri, WikiData, German National Authority File a.s.o.)?
How can we establish (new?) standards for multilingual metadata in NLS and how rigid or how flexible do they have to be?
How can the recognition rate of non-standard characters with OCR be improved?
How can multilingual/multiscript data from different sources be integrated and processed (semantic mapping, annotation, translation, NER, tagging etc.) in collaborative research platforms?

Furthermore, in line with the discourse on the digital transformation of academic research and teaching, the need for stronger collaboration rises. Especially in the case of externally funded research projects, specific knowledge on DH and NLS can seldom be held within the organisation and projects tools and platforms often cannot be maintained longer than the duration of the project. Instead, these responsibilities should be part of the service portfolio of research infrastructure institutions like libraries or data centers. However, these institutions are normally not equipped to support all languages and disciplines.
The workshop tackles these issues

by presentations with challenges, answers and recommendations of 15 experts conducting DH projects with Arabic, Chinese, Japanese, Korean (CJK) script sources and ancient Egyptian hieratic signs,
by providing time for group discussions and a wrap-up session for securing modes of documentation, future collaboration of NLS DH tools best practice and their transmission to research infrastructure institutions.

The presentations will address the following subjects and methods of NLS DH:

digital representation of manuscripts: OCR (Arabic, CJK)
digital representation of NLS (non-Unicode signs: Ancient Egyptian hieratic signs, Old South Arabian, CJK)
digital research infrastructures and virtual research environments for NLS DH projects (Arabic, CJK, Ancient Egyptian hieratic signs )
multilingual metadata and metadata in NLS (Arabic, CJK, Ancient Egyptian hieratic signs )
semantic web and linked (open) data in NLS (CJK)
text encoding and mark-up languages and NLS (Arabic, CJK, Ancient Egyptian hieratic signs )
data mining / text mining in NLS (Arabic, CJK)
NER, machine translation, annotation for NLS (Arabic, CJK)

Workshop Format
The first part of the workshop will give developers and researchers the chance to present their challenges, solutions and tools for NLS DH related problems and questions.
In the second part of the workshop, we will develop an organisational strategy for cooperation and collaboration among the NLS DH community through a working group session. To aid organisation, we will provide a Wiki that participants can use during and after the workshop to organise collaboration.
We envision the results of this workshop to be a cornerstone for building an institutionalized network of NLS DH practitioners with a joint knowledge management system (Wiki, Github etc.) and communication channels. Furthermore, we have initiated a NLS DH handbook and want to develop this into a living handbook to be maintained by the aforementioned NLS DH network.
Finally, contributions to the workshop are planned to be published in a special issue on NLS and DH in an open access journal (scheduled for the second half of 2019).

List of presentations and presenters:

For updated information on the workshop and a list of presenters with abstracts of their presentations please refer to the following

Link

.

Arabic script:

“Arabic Script in Digital Humanities Research Software Engineering”

Presenters: Oliver Pohl, Jonas Müller-Laackmann (Berlin-Brandenburg Academy of Sciences and Humanities, Germany)

“Towards a Versatile Open-Source Ecosystem for Computational Arabic Literary Studies”

Presenters: Mahmoud Kozae and Dr. Jan Jacob van Ginkel (Freie Universität Berlin, Department of History and Cultural Studies, ERC Advanced Grant: “Kalīla and Dimna – AnonymClassic”, Principal Investigator: Prof. Dr. Beatrice Gründler; Germany)

Ancient Egyptian hieratic script:

„Ancient Egyptian Hieratic Script – Aspects of Digital Paleography for a NLS“

Presenters: Svenja Gülden, Susanne Gerhards, Tobias Konrad; Akademie der Wissenschaften und der Literatur / Johannes Gutenberg University Mainz (Germany); Project “Altägyptische Kursivschriften” (Ancient Egyptian cursive scripts)

CJK scripts:

Chinese

“SHINE: A Novel API Standard & Data Model to Facilitate the Granular Representation and Cross-referencing of Multi-lingual Textual Resources”

Presenters: Pascal Belouin, Sean Wang (Max Planck Institute for the History of Science; Department III, RISE Project "Research Infrastructure for the Study of Eurasia”, Berlin, Germany)

“No text - no mining. And what about dirty OCR? Training, optimizing, and testing of OCR/KWS-Methods for Chinese Scripts”

Presenter: Amir Moghaddass (Freie Universität Berlin, Campus Library, Project “Alt-Sinica”, Germany)

“Multilingual research projects: Challenges (and possible solutions) for making use of standards, authority files, and character recognition”

Presenter: Matthias Arnold (Universität Heidelberg, Cluster of Excellence “Asia and Europe in a Global Context”, Heidelberg Research Architecture, Germany)

Japanese

"Expectations and reality: developing an English-Japanese semantic web environment for the Late Hokusai research project"

Presenter: Stephanie Santschi (British Museum, Project “Late Hokusai: Thought, Technique, Society”, UK)

Korean

“Curation Technologies for a Cultural Heritage Archive. Analysing and transforming the „Project Tongilbu“ data set into an interactive curation workbench”

Presenter: Peter Bourgonje (DFKI GmbH [German Research Center for Artificial Intelligence], Speech and Language Technology Lab, Germany)

"Creating, Linking, Visualizing and Interpreting Chinese and Korean datasets with MARKUS Environment"

Presenters: Jing Hu, Leiden University, The Netherlands; Ba-ro Kim, Chung-Ang University, South Korea

Notes
[1] For a workshop report in English see
https://blogs.fu-berlin.de/bibliotheken/2019/01/18/workshop-nls2018/; the German version was published at
DHd Blog (10/2018).

[2]
e.g. annual DH Asia conferences at Stanford University, see
http://dhasia.org/ (USA) ; a Summer School in June 2019 on right2left issues at the Digital Humanities Summer Institute, see
http://www.dhsi.org/events.php (Canada) ; a Workshop on “Nicht-lateinische Schriften in multilingualen Umgebungen: Forschungsdaten und Digital Humanities in den Regionalstudien” (see note 1., Multilingual infrastructure and Non-latin scripts: Digital Humanities and Research Data in Area Studies) in June 2018 at Freie Universität Berlin/Campus Library (Germany).

Bibliography

Fiormonte, Domenico (2012). Towards a Cultural Critique of the Digital Humanities.
Historical Social Research / Historische Sozialforschung 37:3 (141), pp. 59-76.

Mahony, Simon (2018 ). Cultural Diversity and the Digital Humanities.
Fudan Journal of the Humanities and Social Sciences pp. 1-18. Springer. DOI:
https://doi.org/10.1007/s40647-018-0216-0 (accessed 24 April 2019).

Mullaney, Tom (2017):
Call for proposals : Digital humanities Asia : Harnessing Digital Technologies to Advance the Study of the Non-Western World, 26-29 April 2018, Stanford University.
https://carnetcase.hypotheses.org/3165 (accessed 24 April 2019).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.