The Scriptores Historiae Augustae: A Demonstration of Multiple Authorship

  1. 1. Penelope J. Gurney

    Université d'Ottawa (University of Ottawa)

  2. 2. Lyman W. Gurney

    Université d'Ottawa (University of Ottawa)

The Scriptores Historiae Augustae stands as a lightweight collection of biographies, written, according to the manuscript tradition, by six authors, about the end of the third century CE, and the beginning of the fourth [2]. Earlier attempts to prove single authorship have been based upon subjective analyses of characteristics of writing style, in which a number of scholars of eminent reputation [3], [7], [1] have applied a close analysis to the 30 biographies in the SHA, and have catalogued the personal characteristics which they believe are attributable to a single person. Other researchers, notably [6], have provided plausible arguments in support of the traditional multiple authorship. The argumentation, up until now, has been based upon circumstantial evidence, and hence the true nature of the authorship has remained a matter of conjecture.
The need for certainty of authorship is no trivial matter. The 30 biographies of the SHA deal with the accounts of emperors and of attempted usurpers, and are critical for any understanding of the complexity of the tumult of the years from about 117 to 284 CE. They are complemented, but not replaced, by a number of Greek historical works and minor Latin texts.
The only real certainty about the biographies is that they are replete with forged information and with deliberate interpolations of imagined accounts of events, and must therefore be treated with consummate care, lest the fanciful creations by the authors of what they allege to be facts be accepted uncritically as simple historical fact. Hence it is clear that assignment to a single author would make the problem of sifting the texts for source material in history infinitely easier, than if the same analysis of personality must otherwise be repeated perhaps six times, each on a much smaller body of writing.
General Discussion of Analysis
Our statistical analysis of the SHA has been based upon the premise that subtle differences of the complex stylistic usage of key function words in the complete texts of all 30 biographies will permit a separation of authorship, if such exist. To permit this analysis, we have lemmatized and disambiguated fully the entire text of over 100,000 words, so that every form that has been utilized in the text has been replaced by its dictionary head-word. From these disambiguated texts, we have then built a matrix of head-word frequencies of use, to be passed to various statistical packages such as SAS and SPSS for analysis.
This type of stylometric analysis has been forced by the unique character of the problem. Previous analyses of authorship have normally been based upon the need to verify the authorship of a limited number of works, where there is also a body of text of known authorship for literary and statistical comparison. The authors of the manuscript tradition of the SHA, however, come unheralded, and by their very anonymity, challenge scholars to deny their existence, since the names of so many authors of that age are at least listed in the literature, whereas these six come unannounced, and (by close analysis of the constant appearances of obvious anachronisms) appear to claim to have written their works several generations earlier than they could possibly have done so. In our initial analyses, we were able to differentiate the groups of texts, in accordance with the manuscript tradition, to a very high degree of confidence. The analyses could not prove, of course, that six authors wrote the works assigned specifically to them; but they did prove that the probability was extremely slight that such results could have been arrived at by statistical chance.
This final stage of analysis has been devoted to the assignment of an earlier body of biographies, the De Vita Caesarum (Lives of the Twelve Caesars ) of Gaius Suetonius Tranquillus, as a control work, in order to ensure that the statistical process that separated the six groups of biographies will also separate them, in a similar fashion, from a group of comparable biographies of similar genre, and, so far as is known at present, of undisputed single authorship. This process has now demonstrated, with no significant change in the level of confidence, that the probability of single authorship remains vanishingly small, and that, quite unfortunately for modern historiographers, there is a considerable degree of validity in the assignment of the 30 biographies to the six unknown authors.
Statistics Used:
The statistical procedures used in this study are based on cluster analysis, principal component analysis, and canonical discriminant analysis. The statistics packages used are SAS, SPSS, and SPSS Diamond.
Cluster analysis and principal component analysis are rather similar, in that each attempts to identify relatively homogeneous sets of cases based on selected characteristics, using an algorithm which starts with each of the 42 elements in a separate cluster, and combines these clusters recursively on the basis of an analysis of linear combinations of the variables. Statistics are displayed at each stage to assist in selecting the best solution. Objects grouped together in the final clusters tend to be similar in certain characteristics, while objects in different clusters differ in those same characteristics. The purpose of using cluster analysis in this project is to examine the relationships occurring naturally within the data, in order to see whether there are, in fact, groups which can be distinguished by means of the vocabulary.
In canonical discriminant analysis, the examination of relationships begins with an explicit selection of group membership. In the current study, 7 groups, consisting of the 6 authors of the manuscript tradition and that of the control works of Suetonius, have been examined, with the 42 individual biographies in this way being assigned to the appropriate 7 groups for analysis. If the elements within each of the individual groups chosen by the researcher are not strongly related, the final clusters will overlap, or be ‘smeared’ together; but even if apparently clear and decisive final clustering exists, the validity of the result can be verified only by an analysis of four decisive resulting numeric values that accompany the procedure: Wilk's Lambda, Roy's Greatest Root, Pillai's Trace, and Hotelling-Lawley's Trace. Only if the mathematical demands of these results have been met can the results of the final clustering be used as a demonstration of separation of the individual clusters.
Specific Discussion of Analysis
In an earlier part of this study [5], the finding was made that subsets cannot be used reliably in textual analysis. In other words, the literary works which we have been examining are not individually homogeneous internally, so that blocks of text do not necessarily provide the same results as does the full text. For this reason, the full text of each of the works was used, but the frequencies of use of lemmas in each category were normalized, to permit a standard comparison amongst the works which would not be unduly influenced by the size of the specific work.
Several sets of vocabulary items have been shown [4] to have good discriminating value for the different authors of lives in the SHA as a whole. Sets of function words, conjunctions, nouns and pronouns, and lemmas which are common to all samples, were among those data sets which provided statistically significant results for Suetonius and for the SHA corpus in its entirety.
Function Words
Function words have been said to be the best kind of discriminating words to use in authorship analyses [8], since they should reflect style rather than content. That is, such words should appear in similar fashion in all works by an author, regardless of the subject matter of the text. In the analysis of this concept, lemmatized function words have been used, rather than non-lemmatized forms. The reason for this is readily apparent in examining the verb 'esse' ('to be'), which may in many of its tenses be a verb in its own right, or be solely an auxiliary to another verb. An individual author may vary his usage of such words differently from other authors, however, a fact which would not be apparent in an analysis of forms alone.
Nouns and Pronouns
The set of nouns and pronouns consists of a number of the 20 most frequently-occurring lemmas of these types in the texts being examined. In Latin, as in other highly inflected languages, these vocabulary items are not usually examined; in this study, however, it has become possible, and really imperative, to examine nouns and pronouns for three interrelated reasons. First, all the works are of the same genre, and follow the same formula; they are biographies of emperors in which the early life is first described, then honours and the like are discussed, followed by a description of the death of the individual. Hence the analysis has examined the expectation of a common vocabulary. The second reason is that, after lemmatization, the actual vocabulary can be distinguished more readily in any language, be it English, French or any other. For example, in Latin, 'modo' represents both an adverb ('merely') and the ablative case of a noun (‘size’ or ‘length’); and 'mora' refers to a set of nouns which may mean ‘a delay or pause on the march’, ‘a type of fish’, or ‘a division of the Spartan army’. In non-lemmatized vocabulary, these, and many other examples would be counted together; and since the text of the SHA contains over 3000 different examples of ambiguous forms, each of which has two or more meanings, the result would be the loss of clarity and integrity of information that might otherwise lead to a separation of authorship. And lastly, after lemmatization, the frequencies of occurrence of individual nouns and pronouns become large enough to make a statistically valid analysis of the vocabulary used.
The results of the statistical procedures used point very strongly to multiple authorship of the Scriptores Historiae Augustae, with virtually no probability that the attributions to the six authors in the manuscript tradition could have been assigned by sheer chance, and hence that there is a correspondingly low degree of confidence in the attribution of all 30 lives to a single author. The decisive clustering that we obtained in the original analysis of the 30 lines in the SHA was repeated after the addition of a major control work, the De Vita Caesarum of Suetonius Tranquillus; and the values of Wilk's Lambda, Roy's Greatest Root, Pillai's Trace, and Hotelling-Lawley's Trace continue to affirm the existence of a high degree of confidence in multiple authorship.
Although this study has demonstrated the probability, with a high degree of confidence, of the existence of multiple authorship of the Scriptores Historiae Augustae, it has not, of course, proved the specific attributions of the 30 biographies to the 6 authors of the manuscript tradition. Such specific attribution could be developed only if there were comparable biographies of undisputed authorship by these anonymous authors. Hence the attributions of the manuscripts are probably the most accurate that we can at present develop, and are the best that can be given to historians.
