Computational Literary Studies Infrastructure (CLS INFRA): a project to connect people, data, tools, and methods:

Julie Birkholz; Ingo Börner; Sally Chambers; Vera Charvat; Silvie Cinková; Tess Dejaeghere; Julia Dudar; Matej Ďurčo; Maciej Eder; Jennifer Edmond; Evgeniia Fileva; Frank Fischer; Serge Heiden; Michal Křen; Bartłomiej Kunda; Michał Mrugalski; Ciara Murphy; Carolin Odebrecht; Marco Raciti; Salvador Ros; Christof Schöch; Artjoms Šeļa; Toma Tasovac; Justin Tonra; Erzsébet Tóth-Czifra; Peer Trilcke; Karina van Dalen-Oskam; Lisanne van Rossum

Authorship

1. Julie Birkholz

Ghent University
2. Ingo Börner

University of Potsdam
3. Sally Chambers

Ghent University
4. Vera Charvat

OEAW Österreichische Akademie der Wissenschaften / Austrian Academy of Sciences
5. Silvie Cinková

Charles University
6. Tess Dejaeghere

Ghent University
7. Julia Dudar

Universität Trier
8. Matej Ďurčo

OEAW Österreichische Akademie der Wissenschaften / Austrian Academy of Sciences
9. Maciej Eder

Institute of Polish Language (Polish Academy of Sciences)
10. Jennifer Edmond

DARIAH-EU
11. Evgeniia Fileva

Universität Trier
12. Frank Fischer

University of Potsdam
13. Serge Heiden

Ecole Normale Supérieure de Lyon (ENS de Lyon)
14. Michal Křen

Charles University
15. Bartłomiej Kunda

Institute of Polish Language (Polish Academy of Sciences)
16. Michał Mrugalski

Humboldt-Universität zu Berlin (Humboldt University)
17. Ciara Murphy

National University of Ireland, Galway
18. Carolin Odebrecht

Humboldt-Universität zu Berlin (Humboldt University)
19. Marco Raciti

DARIAH-EU
20. Salvador Ros

UNED, Madrid
21. Christof Schöch

Universität Trier
22. Artjoms Šeļa

Institute of Polish Language (Polish Academy of Sciences)
23. Toma Tasovac

Belgrade Center for Digital Humanities
24. Justin Tonra

National University of Ireland, Galway
25. Erzsébet Tóth-Czifra

DARIAH-EU
26. Peer Trilcke

University of Potsdam
27. Karina van Dalen-Oskam

Huygens Institute for the History of the Netherlands (Huygens ING) - Royal Netherlands Academy of Arts and Sciences (KNAW)
28. Lisanne van Rossum

Huygens Institute for the History of the Netherlands (Huygens ING) - Royal Netherlands Academy of Arts and Sciences (KNAW)

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Abstract

The aim of this poster is to provide an overview of the principal objectives of the CLS INFRA project, its aims and structure as well as ways to get in touch.

Introduction
Just as much as the exact sciences, research in the social sciences and the humanities relies on research infrastructures: no research could be conducted without academic libraries, cultural heritage institutions, and academic and mass-market publishers. The digital turn, however, not only reshaped the theoretical and methodological frameworks in several disciplines, but it also redefined the notion of research infrastructures (see e.g. Borgman 2010, Moulin et al. 2011, Kitchin 2021). Nowadays, at least in Digital Humanities, it is hard to conduct any cutting-edge research without access to the relevant digital resources, tools to analyze them, networks of collaborating teams and individuals, and efficient communication channels to disseminate the results. In particular, this applies to computational literary studies (CLS).

An Infrastructure for CLS
With respect to the field of computational literary studies more specifically, the digital age offers challenges and opportunities for completing research on Europe’s multilingual and interconnected literary heritage. At present, the landscape of literary data is diverse and fragmented. Even though many resources are currently available in digital libraries, archives, repositories, websites or catalogues, a lack of standardisation hinders how they are constructed, accessed and the extent to which they are reusable (Ciotti 2014). The Computational Literary Studies Infrastructure (CLS INFRA) project aims to federate these resources, with the tools needed to interrogate them, and with a widened base of users, in the spirit of the FAIR and CARE principles (Wilkinson et al. 2016, Carroll 2020). The resulting improvements will benefit researchers by bridging gaps between greater- and lesser-resourced communities in computational literary studies and beyond, ultimately offering opportunities to create new research and insight into our shared and varied European cultural heritage. CLS INFRA’s efforts are central to catering to these urgent infrastructural needs of a growing user community.

Project website:
https://www.clsinfra.io.

Rather than building entirely new resources for literary studies, the project is strongly committed to exploiting and connecting the already-existing efforts and initiatives, in order to acknowledge and utilize the immense human labour that has already been undertaken. Therefore, the project builds on recently-compiled high-quality literary corpora, such as DraCor and ELTeC (Fischer et al. 2019, Burnard et al. 2021, Schöch et al. to appear), integrates existing tools for text analysis, e.g. TXM, stylo, multilingual NLP pipelines (Heiden 2010, Eder et al. 2016), and takes advantage of deep integration with two other infrastructural projects, namely the CLARIN and DARIAH ERICs.

See
https://www.dariah.eu and
https://www.clarin.eu.

Consequently, the project aims at building a coherent ecosystem to foster the technical and intellectual findability and accessibility of relevant data. The ecosystem consists of (1) resources, i.e. text collections for drama, poetry and prose in several languages, (2) tools, (3) methodological and theoretical considerations, (4) a network of CLS scholars based at different European institutions, (5) a system of short-term research stays for both early career researchers and seasoned scholars, (6) a repository for training materials, as well as (7) an efficient dissemination strategy. The structure of the project with its work packages closely follows the above components of the infrastructure.

The project is delivered by a geographically balanced, complementary transnational consortium of key local and national infrastructure providers, covering the full range of the project’s defined areas for integration and innovation and aligned so as to create a common infrastructural approach for computational literary studies. In particular the deep integration of both the CLARIN and DARIAH ERICs ensure the project’s long term stability and sustainability.

Conclusion
The key aim of our poster is to provide a wide range of stakeholders – researchers, librarians, infrastructure providers – with an understanding of our project and with contact points for specific issues as to motivate them to get involved.

Acknowledgements
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004984.

Beyond the authors of this poster proposal, we would like to acknowledge the role of the work package leads: Maciej Eder (coordinator), Justin Tonra, Christof Schöch, Karina van Dalen-Oskam, Carolin Odebrecht, Matej Ďurčo, Peer Trilcke, Frank Fischer, Julie M.
Birkholz, Marco Raciti. Find out more about the team here:

Bibliography

Borgman, Christine. 2010.
Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass. & London: MIT Press.

Burnard, Lou, Christof Schöch, and Carolin Odebrecht
. 2021. In search of comity: TEI for distant reading.
Journal of the Text Encoding Initiative

14.

Ciotti, Fabio
.
2014. Digital literary and cultural studies: the state of the art and perspectives.
Between

4/8, 1-17.

Eder,
Maciej
, Rybicki,
Jan
and Kestemont, M
ike
.

2016. Stylometry with R: a package for computational text analysis.
R Journal
, 8(1): 107-21.

Fischer, Frank, Ingo Börner, Matthias Göbel, Andrea Hechtl, Christopher Kittel, P. Miling, and Peer Trilcke
. 2019. ‪Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama‬. In
Book of Abstracts of the Digital Humanities Conference 2019
. Utrecht: ADHO.

Heiden, Serge
. 2010. The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In
24th Pacific Asia Conference on Language, Information and Computation

(pp. 10 p.). Sendai,
Japan.
Retrieved from

Kitchin, Rob
. 2021.
The data revolution: big data, open data, data infrastructures and their consequences
. 2nd edition. Thousand Oaks: Sage Publications Ltd.

Moulin, Claudine, Arianna Ciula, and Julianne Nyhan
. 2011.
Research Infrastructures in the Digital Humanities
. Science Policy Briefing 42. Strasbourg: European Science Foundation.

Schöch, Christof, Tomaz Erjavec, Roxana Patras, and Diana Santos
(to appear). Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives.
Modern Languages Open
.

Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg
. 2016. The FAIR Guiding Principles for Scientific Data Management and Stewardship.
Scientific Data

3(1).

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022

"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO