Efficient computational support for scholarly textual research

paper, specified "long paper"
Authorship
  1. 1. Wim Peters

    Johannes Gutenberg-Universität Mainz (Johannes Gutenberg University of Mainz)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


This presentation discusses and investigates the integration of computational natural language processing methods into established digital humanities workflows. This integration aims at optimizing the scholarly exploration of the conceptual structure of the thematic domain under consideration in a particular corpus of text.
The overall tasks of identifying, extracting, and formalizing knowledge contained in larger volumes of text as part of a scholarly interpretation process are highly specialized and both knowledge and labour-intensive activities, especially when performed manually. The availability of increasing amounts of digital (related) textual data and metadata in the last decades has posed a challenge to the traditional scholarly research workflows in terms of textual data analysis and research data production. Because the traditional close reading method is often time consuming, a large amount of textual source material, to be interpreted in a traditional hermeneutic approach, creates a significant bottleneck for an exhaustive scholarly understanding of the semantic content of the subject revealed in the texts.
The overall emphasis of our work is on how computer-based analysis can be integrated into the incremental hermeneutic cycle of Understanding - Contextualization - Explanation (a.o. Dilthey 1904), resulting in a Digital Hermeneutics workflow.
Central to our interest in this context is the methodological issue of the interpretation of text with the help of digital technology in the form of natural language processing (NLP), which forms a bridge between the linguistic surface structure and the underlying conceptual content of textual resources by means of the computer-based, automatic acquisition of content.
A principled methodology enabling the integration of manual and automatic analysis of textual material aims at identifying and integrating computationally derived suggestions for further focused close reading by scholars, and integrating these results into a workflow that maximizes scholarly conceptual acquisition and exploration.
The underlying assumption is that scholars and computers do not have mutually exclusive ways of working. Often there is not an opposition but a synergetic interaction between algorithmic analysis and close interpretative reading (Hayles 2012). In order to realize a sucessful workflow, computational analysis should conform to some strict criteria. It should be minimally intrusive towards scholarly research strategy, and offer customizability of analysis in service to scholars’ requirements, enabling researchers to flexibly follow leads in their resource analysis, focus on material that has a high potential of touching on their research questions, and incrementally expand the scope of their insights.
In this context it is important to note the partiality of computional interpretation. Computational analysis using NLP is hermeneutically imprecise and incomplete, and can only offer partial suggestions for scholarly hermeneutic research (van Zundert, 2016, Shadrova, 2021). It should not technocratically dictate the research agenda, avoiding that the inclusion of computer-supported data analyses into hermeneutic scholarly workflows entail the replacement of the scholar's interpretive and hermeneutic work (Zaagsma, 2013). Rather, it should establish a balanced integration of both traditional qualitative approaches, such as close reading, and quantitative or other computational techniques, which can help address the limitations of each method alone (Gibbs and Owens, 2013).
For each incremental phase within the Digital Hermeneutic cycle NLP techniques should be selected and customised to scholars’ research questions (Peters et al, 2019; Woolf and Silver, 2018). The advisory, ancillary role of computer analysis ensures maximum scholarly control of research activity and strategy, and causes computational text analysis in the humanities to be minimally intrusive and overbearing (Peters, 2019).
We present a Digital Hermeneutic workflow model for the computer-mediated interpretation and understanding of born digital texts (which precludes error-prone workflows working with texts obtained through OCR). This workflow integrates scholarly activity and digital technologies in the form of both existing tools and custom computational processing. Its circularity fosters deep text interpretation by scholars with the aim of incrementally addressing and expanding the range of research questions asked about a particular theme, within a particular text corpus, with the hermeneutic core goal of understanding in mind.
Overall, we argue that NLP brings possibilities for more focused and fine-grained qualitative text analysis. A tailored combination of quantitative and qualitative text analysis methods within a flexible methodological workflow, involving both scholars and NLP experts, will enable scholars to identify and put into practice research avenues of exploration and verification.
As a use case for our methodology we use textual material from two different sources within the legal domain in the form of digitised medieval council records and minutes from the Convention of the Parties (COP).

Bibliography

Dilthey, W. (1904). Hermeneutics and the Study of History: Selected Works, Volume IV. Edited by R. A. Makkreel and F. Rodi. Princeton, NJ: Princeton University Press.

Gibbs and Owens (2013). The Hermeneutics of Data and Historical Writing.

In: Dougherty, J. and Nawrotzki, K. (eds.) (2013), Writing History in the Digital Age.
E-book, Ann Arbor, MI: University of Michigan Press, DOI: 10.3998/dh.12230987.0001.001

Hayles, K.N. (2012), How We Think: Digital Media and Contemporary Technogenesis. Chicago: University of Chicago Press.

Peters, W., Parks, L. and Lennan, M. (2019). Integrating Language Technology into Scholarly Research Workflows.

In: Lana Pitcher and Michael Pidd. Proceedings of the Digital Humanities Congress 2018. Studies in the Digital Humanities. Sheffield: The Digital Humanities Institute.
Available at:

Shadrova, A. (2021). Topic models do not model topics: epistemological remarks and steps towards best practices. Journal of Data Mining & Digital Humanities. 2021. 10.46298/jdmdh.7595.

van Zundert, J. J. (2016). Screwmeneutics and Hermenumericals: The Computationality of

Hermeneutics. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.), A New Companion to
Digital Humanities (pp. 331-347). Wiley-Blackwell.

Woolf, N., Silver, C. (2018). Qualitative Analysis Using ATLAS.ti. New York: Routledge,

https://doi.org/10.4324/9781315181684

Zaagsma, G., (2013). On Digital History. BMGN - Low Countries Historical Review, 128(4)

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO