Pre-Conference Workshop “DLS Tool Criticism. An Anatomy of Use Cases”

J. Berenike Herrmann; Francesca Frontini; Simone Rebora; Jan Rybicki; Anne-Sophie Bories

Authorship

1. J. Berenike Herrmann

Universität Basel (University of Basel)
2. Francesca Frontini

Université Paul-Valéry Montpellier
3. Simone Rebora

Institution Università degli Studi di Verona (University of Verona), Universität Basel (University of Basel)
4. Jan Rybicki

Jagiellonian University
5. Anne-Sophie Bories

Universität Basel (University of Basel)

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Pre-Conference Workshop “DLS Tool Criticism.
An Anatomy of Use Cases”

The current panorama in DLS presents a plethora of tools, protocols and practices for processing, analysing and visualising data. This diversity of practices and tools originating from different areas (e.g., stylometry, NLP, literary studies, corpus linguistics) has resulted in a rich, but atomised situation.
Using a broad definition of “tool” understood as method, the ADHO-Special Interest Group “Digital Literary Stylistics” (SIG-DLS) organizes a workshop that taps into the
DLS Tool Inventory
(DLS-TI), which is a first attempt to gather information on the practices of the various traditions present in DLS. The DLS-TI features methods and suites for data analysis, including desktop GUIs, online Virtual Research environments and libraries for R or Python, as well as general purpose tools such as Excel spreadsheets. As tools have the power to reify theoretical a prioris (Jannidis & Flanders, 2015; McCarthy, 2005), the community needs a handle for gauging their validity, applying a sense of craft (Piper, 2017) from the perspective of tool criticism (van Es et al., 2018; Koolen et al., 2018).

Building on the DLS-TI, in concert with other initiatives (
LRE-map, Calzolari et al., 2012;
DIRT directory;
Catalogue of Digital Editions;
IDE), we aim at taking stock, but also reflect on, our methods. Three use cases representing different types of digital tools will undergo an “anatomy”: Textométrie, Stylometry, and Semantic Text Mining. Three scholars will each present a use case, elucidating (a) reasons for choosing the method; (b) the method’s impact on the analysis and literary modeling; (c) advantages and limitations.

The discussion will also address traditions of Digital Literary Stylistics between “digital” (computational linguistics, text mining, corpus linguistics) and “analog” (formal and subjective approaches), addressing the fit of data and method to literary modeling (Piper, 2019; Underwood, 2019; cf. Da, 2019).
Anatomy of tools: A closer look at “textual DH” methodologies
Based on what has emerged so far from the DLS-IT and further observations of research practices, we have identified a three groups of tools that can be covered in this half-day workshop and will be represented by an exemplary use case:

1.Textométrie
Textometry is a traditionally French approach to statistical text analysis, often based on methods such as Correspondence Analysis, which has produced a number of tools, alongside a productive body of research in the domain of stylistics and corpus linguistics.
Textometry and stylistics: which tools and practices for literary interpretation?
(
Clémence
Jacquot, Université Paul-Valéry Montpellier 3, France)

“Textometric tools” are widely used by researchers to explore literary corpora. This session intends to propose a
critical feedback of experience on a stylistic analysis guided by a textometric exploration of Apollinaire's poetic corpus with TXM. Several issues will be discussed: first, how to analyze in concrete terms the evolution of the poetic writing of a single author, Apollinaire, between 1898 and 1918, from a diachronic perspective? In a contrastive study, which are the advantages of TXM. We propose to review the importance of specificities calculations and the various visualizations proposed (AFC, for example), as well as the corpus scores allowed by an annotation of the structural units of the poetic corpus. Finally the methodological and epistemological contributions of a tool such as TXM for stylistic analysis will be discussed from a critical point of view. What observable results does it provide? What does it make visible? How to interpret the salience of certain results? What silence can he throw on other stylistic points of the text?

2. Stylometry

Stylometry uses a series of tools and methods for the statistical analysis of style, based on advanced calculations on word frequencies, including multi-dimensional measurements and machine learning techniques. Their main applications have been both authorship attribution and distant reading. Initially developed through the use of spreadsheets, they have been fully implemented into programming languages such as R and Python, and integrated by a wide variety of visualizations, derived from research fields such as philogenetics and network theory.
Less than countless. Options to move beyond word counting in stylometry

(Mike
Kestemont, University of Antwerp, Belgium)

In a fairly dramatic critique of computational literary studies, Nan Z. Da recently made a controversial case against the application of quantitative methods to literary texts. She argues that much work in this field essentially boils down to "counting words". This view is somewhat reductive but not without merit: it certainly applies to much of the present-day approaches that are dominant in stylometry and, consequently, to many of the tools that are available. While this methodological focus (if not poverty) is to some extent justified by previous empirical work, I will reflect on under-explored options for stylometry to move beyond naive word counting. Stylometrists, for instance, often take pride in the fact that their tools typically work on raw texts that require little preprocessing. In this, stylometry ignores much of the achievements of literary theory in the twentieth century, such as the importance of focalization (perspective) or the (actual, individual) reader. Richer (pre)processing pipelines, that also tap into syntax and discourse, might allow stylometry to revitalize its connection with literary theory, but comes with significant barriers for non-Anglo-Saxon literatures. In this talk, I intend to review some of the less conventional work in stylometry that leads the way in this respect.

3. Semantic Text Mining

Semantic Text Mining applies tools for text analysis and visualization based on semantically enriched and co-occurrence methodologies, such as sentiment analysis, topic modelling, and word embeddings. They offer the potential of addressing key questions in literary theory and narratology, from the identification of genre to the visualization of plot. These emerging approaches are now beginning to broaden the scope of computational literary studies and to open up new, still-unexplored potentialities.
LDA Topic Modeling for Semantic Corpus Analysis
(
Steffen

Pielström, Würzburg University, Germany)

Topic models based on Latent Dirichlet Allocation (LDA) and Gibbs Sampling are a tool for exploring and analyzing the content and semantic structure of digital text corpora that has become popular in digital humanities research in recent years. They allow researchers to model a corpus' content in terms of so called "topics", groups of apparently semantically related words, and show the distribution of these topics within the corpus. Thanks to an increasing number of available tools and libraries, the method is, by this day, accessible to a wide range of users. In contrast to this technical accessibility, the methodology of topic modeling is rather intricate, and users cannot generally use them without making a number of decisions that require some deeper understanding. Additionally, there are aspects of topic modeling that are still in need of systematic methodological research. The session will give a hands-on introduction on goals and method of LDA topic modeling, demonstrate how to experiment with topic models using a simple desktop tool, and address open methodological issues.

4. Discussion

During the discussion we will pose questions on methodological and epistemological levels – including humanistic enquiry vs. data science, explorative vs. confirmative, and qualitative vs. quantitative approaches; as well as the range of research questions, from text similarity to aesthetic effects. The aim of each use case-anatomy is to give an overview of the usability and strengths of the tool (group of tools) in research, as well as pointing out problems and formulating specific avenues for further development. Results will be documented on a special SIG-DLS webpage. Through this, we will produce a guide for DLS-scholars’ orientation, as well as the beginning of a roadmap for further tool development.

The target audience is scholars interested in methods/tools and their linkage to literary modeling. We welcome newbies who look for initial orientation, old hands who wish to progress methodological development (“application”), and any scholars interested in modeling, i.e. epistemological aspects of style-tool criticism (“theory”).

Calzolari, N., Del Gratta, R., Francopoulo, G., Mariani, J., Rubino, F., Russo, I. and Soria, C. (2012). The LRE Map. Harmonising Community Descriptions of Resources. Proceedings of LREC 2012, Eighth International Conference on Language Resources and Evaluation. Istanbul, Turkey, pp. 1084–1089
http://lrec.elra.info/proceedings/lrec2012/pdf/769_Paper.pdf

Da, N. Z. (2019). The Computational Case against Computational Literary Studies. Critical Inquiry, 45(3), 601–639.
https://doi.org/10.1086/702594

Es, K. van, Wieringa, M. and Schäfer, M. T. (2018). Tool Criticism: From Digital Methods to Digital Methodology. Proceedings of the 2Nd International Conference on Web Studies. (WS.2 2018). New York, NY, USA: ACM, pp. 24–27 doi:
10.1145/3240431.3240436.
http://doi.acm.org/10.1145/3240431.3240436

Franzini, G. (2012). Catalogue of Digital Editions Zenodo doi:
10.5281/zenodo.1161425.
https://zenodo.org/record/1161425#.XCoDJhNKh8c (accessed 31 December 2018).

Flanders, J., & Jannidis, F. (2015). Flanders, J., & Jannidis, F. (2015). Knowledge Organization and Data Modeling in the Humanities. A Whitepaper.
https://www.wwp.northeastern.edu/outreach/conference/kodm2012/flanders_jannidis_datamodeling.pdf

Koolen, J., Gorp, M. van and Ossenbruggen, J. van (2018). Lessons Learned from a Digital Tool Criticism Workshop. Proceedings from DH Benelux 2018. Amsterdam, The Netherlands.
McCarty, W. (2005). Humanities Computing. London and New York: Palgrave.
Piper, A. (2017). Think Small: On Literary Modeling. PMLA, 132(3), 651–658.
https://doi.org/10.1632/pmla.2017.132.3.651

Piper, A. (2019). Enumerations: data and literary study. Chicago: The University of Chicago Press.
Underwood, T. (2019). Distant horizons: digital evidence and literary change. Chicago: The University of Chicago Press.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review