Università degli Studi di Bari Aldo Moro
Università degli Studi di Bari Aldo Moro
Università degli Studi di Bari Aldo Moro
Università degli Studi di Bari Aldo Moro
The aim of this paper is to focus on a linguistic corpus in order to detect, by statistical methods and computational tools, either correspondences or contrasts between the hypotheses made by the linguist and the applied procedures made by the software experts.
The corpus is selected from nineteenth century Victorian speeches; in particular, Parliamentary and political speeches delivered by Benjamin Disraeli, both as a member of Parliament and as Prime Minister.
Analogous types of research have been recently carried out [see Labbè, Bolasco, Lebart, Salem, 1995] with reference to contemporary politicians such as de Gaulle, Mitterand and the Italian Berlusconi. In fact, computational research in political discourse is a highly significant new brand of criticism, which contributes to the modern notion of political contest by some more hidden truths that can be revealed. Nonetheless, we should say that, in most cases, where contemporary politicians are concerned, the use of "ghost writers" for writing public speeches definitely undermines that language structure/personality relationship which justifies work in textual analysis.
Dealing with 19th century England, we have considered Victorian politics as being the first true "arena" for party politics and as constituting fundamental principles for political dialectics.
The main source of reference was the collection of Selected Speeches by T. E. Kebbel in two volumes, borrowed from the Parliament Library in Rome. Disraeli's personal letters written to Lady Bradford and Lady Chesterfield also constitute a valuable source of reference, in that they unveil psychological traits and personal idiosyncrasies of the statesman. Moreover, in classical biographies we have found revealing hints of modernity and wisdom that we think useful to compare to our times. In a further perspective, we would aim at setting up a series of parameters which might characterise English political discourse, defining also its specific contexts by comparing corpora (for example Tory vs Whig discourse or Conservative vs Radical, etc.).
Some intriguing questions have led us to pursue this goal, bearing in mind the old lessons of rhetoricians concerning oratory and trying to see them in the light of modern standard rules: how great is the analogy existing between oratorical qualities, original style and words, phrases and discourse markers more or less unconsciously uttered by an individual? By which standards can this analogy be assessed? And which is the "benchmark" for moral evaluative judgement, when we use a meta-linguistic code to filter the quintessence of discourse? Should, for instance, the most frequent items adhere in meaning and collocation to their speaker's actual socio-political presuppositions? And in which degree should their collocation be found in accordance with given parameters? Moreover, what can computational analysis discover beyond or in contrast to all previous evaluative criticism on the same subject to which so many scholars have contributed in the course of centuries?
We have tried to answer these questions by examining, through data processing, which linguistic marks might define or confirm the features for linguistic excellence.
Our stylometric study is concerned with the Victorian century, an age rich in syntactical perfection and lexical complexities.
We have selected thirty-four speeches given by Disraeli between 1830 and 1870, and have processed them through the software programme "Lexico 2" and SPSS. The corpus has 205,800 occurrences1, with a type/token ratio of 9.83% (known also as a measure of vocabulary richness) and a Guiraud's coefficient G=44.61.
In table 1, for all the sub corpora in the decades, we show the the number of occurrences (N), forms (V) and hapax (V1), the word with maximum frequency for each decade.
Hapax forms (words used only once) constitute nearly the 50% of the total amount of different words, which is a representative ratio marking a highly varied though integrated vocabulary.
In table 2, we show a list of the fifty most important words considered as truly connotative of Disraeli's mind and world, displayed according to total decreasing frequency, making a distinction between the decades, the absolute frequency (F) and the normalized occurrences (x 1000 words).
Moreover, we have disambiguated words (textual forms) in order to obtain the correct rank and we have made a distinction, for some forms, between pronouns, nouns, verbs, adjectives, adverbs and conjunctions. The choice was made to build up the so-called lexical universe of Disraeli's discourse: the chosen words (i.e. power, principle, democracy, oligarchy, empire, etc.) have first been seen as collocates and, for them, we have built their lexical universe, in order to define specific contexts and co-texts through the relationship of proximity and distance.
We have patterned some meaningful words from a diachronic point of view, marking the evolution of his political thought, throughout his long parliamentary activity, detecting their characteristic trends and we have also carried out a factor analysis for some groups of words, clustered according to some semantic categories (i.e. self, key words, generic words, geography, negative words, etc.)
What clearly appears by means of statistical exploitation is as follows:
High occurrence of a limited number of forms (mainly unspecific in meaning);
High occurrence of hapax forms and of various different forms, used with a very low frequency;
percentage of polysyllabic adjectives of Latin origin, used according to alliterative and symmetric structures, creating an appealing effect of harmony and balance. The same effect is obtained by collocating meaningful words used in pairs and the same occurs with words plus adjectives often used in pairs;
High percentage of adjectives in relation to the corpus as a whole and most of them belonging to the semantic category of "greatness" and of "positive feeling".
We think that Disraeli's discourse is powerful, confident, determined, endowed with energy and a brand of sincere English imperialism. He certainly cared for England and for himself and a true interest was also Ireland.
What might appear strange is the zero occurrence of the word "Judaism", even considering the very low presence of all vocabulary concerned with race and religion. He supposedly divided with care his choice of language genres (novel, essays and speeches) according to some well defined goals to pursue.
The high occurrence of generic words, i.e. with a neutral connotative value, seems to confirm some critical judgements expressed by his opponents for which Disraeli's political commitment was in most cases supported only by generic principles of well re-constructed political heritage and, only in a lesser degree he offered detailed policies on various occasions.
A possible consequence that originates from this analysis and supports the original hypothesis made is that Disraeli's political success and his prolonged prominence as a first-rate politician (even when in opposition) was mainly due to his gifts of linguistic excellence and flamboyant oratory. From a different perspective, his Italian-Jewish ancestry adds an odd and mysterious flavour to his success, in the pure aristocratic context of Victorian England.
Notes
According to an empirical criterion, a corpus can be considered wide enough if it is greater than 200,000 words; moreover the ratio V/N (vocabulary or number of different words divided by total occurrences) is 9.8% (when this ratio is over 20%, the corpus is not to be considered wide enough. See Bolasco, Analisi Multidimensionale dei dati. Carocci Editore. Roma. 1999. p.203)
References
Blake, R. (1978). Disraeli. Methuen & Co. Ltd. London.
Bolasco, S. (1999). Analisi Multidimensionale dei dati. Carocci Editore. Roma.
Bolasco, S. (1994). L'individuazione di forme testuali per lo studio statistico dei testi con tecniche di analisi multidimensionale, SIS Atti della XXXVII Riunione Scientifica 1994, Vol.2, pp.95-103.
Bolasco, S. (1995). Criteri di lemmatizzazione per l'individuazione di coordinate semantiche. In S. Bolasco and R. Cipriani (eds) Ricerca qualitativa e computer: teorie metodi e applicazioni. Franco Angeli, Milano. pp.87-111.
Bolasco, S. (1997). L'analisi informatica dei testi, in L. Ricolfi, La ricerca qualitativa, NIS, pp.165-202.
Kebbel, T. E. (1882) Selected speeches of the Late Earl of Beaconsfield. 2 vols.
Holmes, D. I. (1985). The analysis of Literary Style-A review. J.R.Statistical Society, part 4, pp.328-41.
Labbé, D. (1990). Le vocabulaire de François Mitterand. Paris, Presses de la FNSP.
Labbé, D., Hubert, P. La structure du vocabulaire de Général De Gaulle. JADT 1995. Atti Analisi Statistica dei Dati Testuali, vol.II CISU. Roma.
Lebart, L., Memmi, D. (1984). Analisi dei dati testuali: applicazione al discorso politico. SIS Atti della XXXII Riunione Scientifica 1984, Vol. I, pp. 27-41.
Lebart, L., Salem, A. (1994). Statistique Textuelle. Paris, Dunod.
Lebart, L., Salem, A. Berry, L. (1998). Exploring textual data. Kluwer Academic Publishers.
Levinson, S. C. (1995). Pragmatics. Cambridge Textbooks in Linguistic. Cambridge University Press.
Monypenny and Buckle, G.E. (1910-20). The life of Benjamin Disraeli Earl of Beaconsfield. 6 vols.
Oakes, M. P. (1998). Statistics for Corpus Linguistics. Edinburgh University Press.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press.
Smith, P. (1996). Disraeli. A brief life. Cambridge University Press.
Tweedie, F. J., Baayen, R. H. (1998). How variable may a constant be? Measures of Lexical Richness in Perspective. Computers and Humanities, 32 (5): 323-352.
Zetland, Marques of (ed) (1929). The Letters of Disraeli to Lady Bradford and Lady Chesterfield. 2 vols.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at University of Glasgow
Glasgow, Scotland, United Kingdom
July 21, 2000 - July 25, 2000
104 works by 187 authors indexed
Affiliations need to be double-checked.
Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/