University of the West of England
University of the West of England
Introduction
This paper describes a study of the authorship of the Consolatio Ciceronis, a work written by Cicero on the occasion of the death of his daughter, only fragments of which are known to have survived. In 1583, a text was published in Venice and Bologna purporting to be the rediscovered Consolatio itself. The second editon of this book (September 1583) was edited by Carlo Sigonio, a prominent humanist scholar and skilled imitator of classical Latin at the time.
Against the backdrop of sixteenth century Italy and the humanist movement, Carlo Sigonio emerged as one of the most prominent scholars on Roman antiquity. During this time, writing in the style of Ciceronian Latin became hugely popular, a skill in which Sigonio could claim mastery [11]. He could also claim expertise in Cicero's writings, having lectured on them at university level as well as having published editions of Cicero's work, including extant fragments of the Consolatio, known from quotations in surviving works by other authors. Cicero wrote the Consolatio as a source of comfort over the death of his daughter Tullia in 45BC, discussing in this piece the virtues and merits of death. When this work re-appeared in1583, other humanist scholars, led by Antonio Riccoboni, attacked it as a forgery by Sigonio. While the style appeared to be much like Cicero's, there were points of inconsistency such as vocabulary which appeared to be post-Ciceronian including Christianized Latin terms [7]. Most subsequent scholars have accepted this work to be a forgery. However, with the exception of Sage's book [10], little work has been done to investigate the true source of the Consolatio.
The aim of the present work is to weigh the stylometric evidence for and against Sigonio's authorship, using modern stylometric techniques, unavailable in 1910.
Materials
We are currently in the process of assembling a collection of writings by Cicero and Sigonio, as well as various Classical and Renaissance control texts. Orations from prominent humanists and imitators of Classical Latin, i.e. Muretus, Riccoboni, and Vettori, have been included as examples 16th century Latin. Prose selections from a similar genre (philosophical) from Seneca, Nepos, and Tacitus, have been used for Classical Latin controls as well.
For the present study, 21 text files were used, as detailed in Table 1.
Table 1 -- Text Files Used.
ID-Code
Size in Words
Description
C1
9277
Cicero's "Amicitia"
C2
12344
First half of Cicero's "Brutus"
C3
12231
Second half of "Brutus"
C4
569
Two short poems by Cicero:"De Consulato Suo" and "Marius"
C5
2202
Cicero's "Somnium Scipionis"
X
8713
First half of the 1583 "Consolatio"
Y
8758
Second half of the above
M1
3736
Oratio IV by Muretus, Marc-Antoine Muret
M2
2539
Oratio II by Muretus
M3
2651
Oratio XXVI by Muretus
M4
3652
Oratio XXIII by Muretus
N
3480
"Atticus" by Nepos
S1
2393
Oration I by Carlo Sigonio
S2
2229
Oration II by Sigonio
S3
2984
Oration V by Sigonio
S4
3054
Oration VI by Sigonio
T
6742
The "Agricola" of Tacitus
V1
2543
"Oratio Funebris de Laudibus Ioannis Medicis" by Piero Vettori
V2
2202
"Oratio Habita ad Iulium III" by Vettori
V3
4347
"Liber de Laudibus Ioannae Austriacae" by Vettori
V4
3652
"Oratio Petri Victorii in Maximilianum II" byVettori
Note that fragments indisputably by Cicero, known to have survived as quotations in other authors' works, were removed from the 1583 text prior to the analyses described below. This process reduced the size of the file by 368 words in total.
Method
A number of studies have appeared recently [2], [1], [3], [5],[12] in which the features used as indicators are not imposed by the prior judgement of the analyst but are found by straightforward
procedures from the texts under scrutiny. Such textual features have been used by Burrows [2] as well as Binongo [1], among others, not only in authorship attribution but also to distinguish among genres. This approach involves finding the most frequently used words and treating the rate of usage of each such word in a given text as a feature. The exact number of common words used varies by author and application. Burrows and colleagues [2], [3] discuss examples using anywhere from the 50 to 100 most common words. Binongo [1] uses the commonest 36 words (after excluding pronouns). Greenwood [4] uses the commonest 32 (in New Testament Greek). Most such words are function words, and thus this approach can be said to continue the tradition, pioneered by Mosteller & Wallace [8], of using frequent function words as markers.
In fact, these studies (and some others) can be lumped together as applications of what may be called the "Burrows Method", which is outlined below.
Pick the N most common words in the corpus under investigation. N may be from 15 to 100. (Manual preprocessing is sometimes done, e.g. distinguishing "that"-demonstrative from "that"-conj.)
Compute the occurrence rate of these N words in each text or text-unit, thus converting each text into an N-dimensional vector of numbers.
Apply statistical techniques of multivariate data analysis to reveal patterns, especially: Principal Components Analysis; Clustering; Discriminant Analysis.
Interpret the results (carefully!).
A striking success of this method is descibed by Burrows [2] on prose works by the Bronte sisters. But the great majority of applications reported have been on English-language texts.
We have applied this method to the Consolatio and our control samples. We used the commonest 20 orthographic words in the 21 texts aggregated. Twenty words were chosen because as a statistical rule of thumb it is advisable to have fewer variables than observations in analyses of this type. Orthographic words rather than lemmata (lexical entries) were used for simplicity and also because we currently have no lemmatization, parsing or tagging software. Indeed for Latin such software tools appear not to exist. In practice, this choice has little impact since the commonest 20 items are mostly non-inflecting forms, as can be seen from Table 2.
Table 2 -- Words Used in Order of Frequency. N = 102841.
Word
Occurrences
Rank
%Freq.
Cumulative
Et
3004
1
2.921
2.9210
In
2147
2
2.0876
5.0087
Út
1267
3
1.232
6.2407
Non
1266
4
1.231
7.4717
Est
1196
5
1.1629
8.6347
Cum
1030
6
1.0015
9.6362
Qui
915
7
0.8897
10.525
Ad
836
8
0.8129
11.338
Ac
806
9
0.7837
12.122
Quod
759
10
0.738
12.860
Sed
739
11
0.7185
13.579
Quam
705
12
0.6855
14.264
Quae
643
13
0.6252
14.890
Si
596
14
0.5795
15.469
Etiam
593
15
0.5766
16.046
Esse
569
16
0.5532
16.599
Enim
561
17
0.5455
17.144
Aut
553
18
0.5377
17.682
Atque
540
19
0.525
18.207
De
508
20
0.4939
18.701
Principal Components Analysis
A principal components analysis was performed on the 21-by-20 data matrix. This is a data-reduction technique that attempts to account for the variation found with a small number of composite variables (e.g. see: [6]). The proportion of variance accounted for by the first 6 principal components is given below in Table 3.
Table 3 -- Variance Accounted for by First 6 PC's. Eigenanalysis of the Correlation Matrix
PC_1
PC_2
PC_3
PC_4
PC_5
PC_6
Eigenvalue
6.0619
3.7403
1.7484
1.5302
1.2441
1.1937
Proportion
0.303
0.187
0.087
0.077
0.062
0.060
Cumulative
0.303
0.490
0.578
0.654
0.716
0.776
It will be seen that the first four principle components (PC's) account for 65.4% of the variance between them.
Figure 1 shows a plot of each text in the space defined by the first two PC's.
Figure 1 -- Scatter Plot of Texts in First 2 PC's.
To interpret this diagram, it is helpful to have some idea of which words load most heavily on the two dimensions plotted (PC_1 and PC_2). This information is given in Table 4.
Table 4 -- Words with Highest & Lowest Loadings on PC_1 & PC_2.
Most Positive Loading
Most Negative Loading
PC_1
quod (0.326)
ac (-0.264)
PC_2
in (0.355)
ad (-0.395)
Thus works appearing to the right-hand side of Figure 1 tend to be characterized by a relatively high frequency of "quod" and a relatively low frequency of "ac", with those towards the left having the reverse pattern. Works high up in this figure have a relatively high rate of occurrence of "in" compared to "ad", while those low down have a relatively high rate of "ad" compared to "in". Of course, the other 16 words also contribute to both axes, so this should be taken merely as a guide.
In this 2D space, works by the same author do tend to fall into groups, except that the sample of verse by Cicero (our smallest text and of a different genre from the other samples) is separated from the Ciceronian prose samples. Also Muretus and Sigonio (two noted exponents of neo-Classical Latin style) overlap. In addition, one Muretus sample (M2) falls very close to Cicero's Somnium. The two divided works (parts 1 and 2 of Cicero's Brutus and parts 1 and 2 of the Consolatio) fall close to each other, as they should.
The nearest neighbour of both Consolatio samples is an oration of Sigonio (S3). Visually the Consolatio does appear closer to both Sigonio and Muretus than to Cicero. However these two components account for less than 50% of the total variance, yet three- or four- dimensional plots are difficult or impossible to display. So it was decided to perform a cluster analysis using the first four principal components to confirm this impression.
Cluster Analysis
Several hierarchical clustering methods were used (e.g. Average Linkage, Centroid Linkage, Ward's Method) with essentially the same results. Figure 2 presents some SPSS output, a dendrogram, using Average Linkage between groups and Euclidean distance (the SPSS default setting).
Figure 2 -- Dendrogram.
Dendrogram using Average Linkage (Between Groups) Rescaled Distance
The first pair of items to be grouped together are the two halves of the Consolatio. Thus they are more homogenous than the two halves of Cicero's Brutus, which are joined at step 5 of the agglomeration process. The third step, joins S3 (Sigonio's Oration V) with the initial cluster formed by both parts of the Consolatio.
In general, works by a single author cluster together: of the first 10 linkages made, 7 involve joining works by the same author, one is a linkage of Nepos with Cicero's Brutus, one is a linkage of Tacitus with Vettori's "Habita" and the other is the linkage of the Consolatio with S3.
Thus in the notional 4-dimensional space derived from usage rates of 20 high-frequency Latin words, the Consolatio is more similar to works by Sigonio than to works by Cicero.
Discussion
The findings from this Burrows-style analysis support received opinion among Latin scholars that the Consolatio resembles Sigonio's style more than it resembles Cicero's. It also resembles two pieces by Muretus more than any by Cicero.
However, the genuine Ciceronian samples are more widely dispersed (variable) than those of the other Latin authors sampled, so this evidence is at present suggestive rather than conclusive. We intend to continue this project by gathering more Latin text samples and by applying other methods of stylometric analysis in order to arrive at a more definitive verdict in this case.
Acknowledgements
This project could not have been carried out without the help of Professor Jane Crawford (Loyola Marymount University), Professor Bernard Frischer (UCLA) and Dr David Holmes (TCNJ), all of whom are responsible for its inception. Heartfelt thanks must also be given to the British Academy and UWE, Bristol whose financial support have made this research possible.
References
1. Binongo, J.N.G. (1994). Joaquin's Joaquinesquerie, Joaquinesquerie's Joaquin: A Statistical Expression of a Filipino Writer's Style. Literary & Linguistic Computing, p. 9(4), pp. 267-279.
2. Burrows, J.F. (1992). Not unless you Ask Nicely: the Interpretive Nexus between Analysis and Information. Literary & Linguistic Computing, p. 7(2), pp. 91-109.
3. Burrows, J.F. & Craig, D.H. (1994). Lyrical Drama and the "Turbid Montebanks": Styles of Dialogue in Romantic andRenaissance Tragedy.Computers & the Humanities, p. 28, pp. 63-86.
4. Greenwood, H.H. (1995). Common Word Frequencies and Authorship in Luke's Gospel and Acts. Literary & Linguistic Computing, p. 10(3), pp. 183-187.
5. Holmes, D.I. & Forsyth, R.S. (1995). The `Federalist' Revisited: New Directions in Authorship Attribution. Literary & Linguistic Computing, pp. 10(2), pp. 111-127.
6. Manly, B.F.J. (1994). Multivariate Statistical Methods: a Primer. Champman & Hall, London.
7. McCuaig, W. (1989). Carlo Sigonio: The Changing World of the Late Renaissance. Princeton University Press: Princeton.
8. Mosteller, F. & Wallace, D.L. (1984). Applied Bayesian and Classical Inference: the Case of the Federalist Papers. Springer-Verlag, New York.
10. Sage, E.T. (1910). The pseudo-Ciceronian Consolatio. University of Chicago Press, Chicago.
11. Sandys, J.E. (1967). History of Classical Scholarship, volume 2. Hafner, New York.
12. Tweedie, F.J. (1997). A Statistical Investigation into the Provenance of "De Doctrina Christiana", Attributed to John Milton. Ph.D. Thesis, Faculty of Computing & Mathematics, University of the West of England, Bristol.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)
Debrecen, Hungary
July 5, 1998 - July 10, 1998
109 works by 129 authors indexed
Conference website: https://web.archive.org/web/19991022041140/http://lingua.arts.klte.hu/allcach98/
References: http://web.archive.org/web/19990225164509/http://lingua.arts.klte.hu/allcach98/abst/jegyzek.htm
Attendance: ~60 (https://web.archive.org/web/19990128030244/http://lingua.arts.klte.hu/allcach98/listpar3.htm)