Stylometry, Network Analysis, and Latin Literature

poster / demo / art installation
Authorship
  1. 1. Maciej Eder

    Pedagogical University of Krakow

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

St Jerome, early-Christian writer and translator of the Bible, claims that he had a dream in which the God accused him: “You are a Ciceronian, not a Christian!”, because he had paid too much attention to the beauty of the Ciceronian style. St Jerome’s dream reflects Christian antiquity’s general attitude to classical literature: pagan texts were claimed to be generally dangerous, and thus they were rarely imitated. Centuries later, Renaissance humanists “discovered” the classical authors again, and they intended to purge Latin of medieval traces. There was no clear answer how to restitute the Latin style, though. The discussion about the ways of imitation, known as the Ciceronian Quarrel, was the single most important literary debate of the Renaissance (DellaNeva, 2007)1. Arguably, all these changes in the attitude to classical literature were followed by style changes measurable with stylometric methods.

While computational stylistics has been usually associated with authorship attribution, recent research shows that the same methods can be used in a much broader context of literary study. Namely, the underlaying idea of tracing similarities between (anonymous) texts can be extended to map textual relations in large-scale approaches to literature. Additionally, they are supported by the seminal concepts of “computation into criticism” (Burrows, 1987)2, “distant reading” (Moretti, 2013)3 and “macroanalysis” (Jockers, 2013)4.

In the present study, stylometric techniques are combined with network analysis to reveal the strongest similarities between analyzed texts on the one hand, and some deeper or less obvious relations, usually filtered out by standard nearest neighbor methods, on the other (Eder, forthcoming)5. Consensus network visualizations (cf. Fig. 1) attempt to overcome drawbacks of Cluster Analysis and other explanatory multidimensional techniques. Firstly, these methods are highly dependent on the number of features (e.g. word frequencies) analyzed, secondly, they are not suitable to visualize large datasets. The technique applied in this study counts frequencies of frequent words in a corpus, computes distances (differences) between samples, and for each sample identifies its nearest neighbors (i.e. the samples that turned out to be the most similar). Next, neighboring samples are turned into connected nodes of a network. The above procedure is repeated many times, in each iteration a different range of frequent words is analyzed. Finally, particular networks produced for each iteration are combined into a single consensus network.

A few dozens of ancient, medieval, and early modern Latin texts (prose and poetry) were harvested from different open-access databases, and analyzed using consensus networks. The research problems to be undertaken included: (1) an analysis of style variation in the classical Latin prose of the Augustan Age and the Silver Age (Cicero, Caesar, Tacitus, Livy, Suetonius, Apuleius, etc.). The aim was to identify which features of language are author-dependent, and which are affected by the literary epoch; (2) an investigation of the Renaissance “restitution” of classical Latin (in a Ciceronian flavor), as opposed to medieval Vulgar Latin as introduced by early Christian writers (Bolgar, 1954)6. This was focused on the question of the extent to which the Renaissance humanists succeeded in imitating the style of Cicero, and whether they truly overcame the medieval vulgar style (as they claimed to have done). Last but not least, (3) an examination of the problem of “Attic” prose and the anti-Ciceronian movement of the late Renaissance and early Baroque, based on the analysis of the style of Justus Lipsius, Erasmus and other writers: Puteanus, Moretus, Fredro etc. (Croll, 1924, 1996; Salmon, 1980; Tunberg, 1999)78910. The questions were as follows: is the trace of Seneca’s and Tacitus’ style indeed noticeable in this modern “Attic” way of writing? Did the “Attic” authors really escape from Ciceronianism in style?

One of the stylometric consensus networks is shown in Fig. 1. It visualizes the textual similarities between 55 poetic texts, in order to assess the question of sequels in Latin poetry. Highlighted texts include Maffeo Vegio’s continuation of the Aeneid, Thomas May’s Supplementum Pharsaliae, and John of Garland’s Integumenta super Ovidium Metamorphoseos, in comparison with their ancient counterparts written by Virgil, Lucan, and Ovid, respectively. One can clearly notice that both Vegius and May were quite successful in imitating the classical authors, while Garlandus’s traces to Ovid are significantly weaker.

Fig. 1: Sequels in Latin poetry: network of 55 poetic texts. Imitated poems marked in blue, their followers marked in red

Other networks were produced for prose. Standard network analysis procedures were applied, and the data were visualized using consensus networks. Instead of expected chronological patterns, the results revealed some other interesting regularities.

Analyzing Latin style with stylometric methods, one should remember that the medieval authors relatively often cite the Bible and related sources, while the humanists’ treatises are full of explicit and/or implicit quotations from classical literature. More importantly, the humanists consciously tried to avoid medieval vocabulary in favor of words that were used at least once by Cicero. For that reason, a stylometric comparison of medieval and early modern Latin brings some additional issues, intensive text re-use being one of the most important (Eder, 2013). Some of these issues will be discussed in detail.

References
1. DellaNeva, J. (2007). Ciceronian Controversies. Cambridge, MA: Harvard University Press.

2. Burrows, J. (1987). Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.

3. Moretti, F. (2013). Distant Reading. London: Verso.

4. Jockers, M. (2013). Macroanalysis: Digital Methods and Literary History. Champaign: University of Illinois Press.

5. Eder, M. (2014). Visualization in stylometry: some reliability issues (forhcoming).

6. Bolgar R. R. (1954). The Classical Heritage and its Beneficiaries. Cambridge: Cambridge University Press.

7. Croll M. W. (1924). Muret and the History of “Attic” Prose. PMLA, 39: 254–309.

8. Croll M. W. (1996). “Attic” prose in the seventeenth century. In: Style, Rhetoric and Rhythm. Ed. by J. M. Patrick and R. O. Evans. Princeton NJ: Princeton University Press, pp. 51–101.

9. Salmon, J. H. M. (1980). Cicero and Tacitus in Sixteenth-Century France. The American Historical Review, 85: 307–31.

10. Tunberg, T. O. (1999). Observations on the style and language of Lipsius’s prose: a look at some selected texts. In: Tournoy, G., Landsheer, J. de, and Papy, J. (eds), Iustus Lipsius: Europae lumen et columen. Leuven: Cornel University Press, pp. 169–78.

11. Eder, M. (2013). Mind your corpus: systematic errors in authorship attribution. Literary and Linguistic Computing, 28(4): 603–14.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO