Detecting and Characterizing National Style in the 19th Century Novel

paper
Authorship
  1. 1. Matthew Jockers

    Stanford University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Detecting and Characterizing National Style in the 19th Century Novel
Jockers, Matthew, Stanford University, mjockers@stanford.edu
In Representative Irish Tales, Yeats identified two basic categories of Irish fiction characterized by what he called “two different accents, the accent of the gentry and the less polished accent of the peasantry” (Yeats 1979). Writing of this distinction, John Cronin notes in The Anglo-Irish Novel how “Maria Edgeworth and William Carleton fit obviously enough the two extremes Yeats has defined but a middle-class figure like Gerald Griffin belongs a little uneasily somewhere in between” (Cronin 1980). Other critics including Thomas MacDonagh (MacDonagh 1916), Thomas Flannagan (Flanagan 1959), and most recently Charles Fanning (Fanning 2000) have all focused attention on the specific use of language in Irish narrative and the extent to which linguistic style and choice of theme and form reflects, or does not, the unique position of these Irish and Anglo-Irish writers in a country where the use of English was to evolve in a rather dramatic fashion.

Though Mark Hawthorne has written that the “Irish were not accustomed to the English language and were unaware of its subtleties and detonations” (Hawthorne 1975), Charles Fanning has argued that the Irish in fact became masters of the English language and employed a mode of “linguistic subversion” that allowed them to comment upon and even satirize the British who all the while seem to miss the point that the joke is on them (Fanning 2000). Cronin argues along similar lines to Fanning when he writes that the. . . idiomatic unease in their novels is not caused by any lack of ability on their part in the writing of a standard English idiom. It derives, rather, from the tangled situation in which they find themselves as novelists, directing their efforts towards an English-speaking public but trying to give that public a creative insight into a linguistically piebald area . . . they turned their very difficulties in regard to idiom to constructive account by confronting head-on the blending of the two idioms and two cultures. . . they turn this linguistic ragout to splendid account, making use in the process of English, Irish, and Anglo-Irish” (Cronin 1980).

The subject of this research paper, then, is the matter of exactly how 19th century Irish novelists uniquely employ style, setting, and theme. The critics seem to agree that something specific is going on in terms of language, form, and setting, and yet none gets to the heart of the matter, to the details of the prose and to the specific uses of language. Leveraging the tools and techniques from the authorship attribution and computational text analysis literature—specifically natural language processing, machine learning, and topic modeling—this paper compares and contrasts both linguistic style and narrative theme in a corpus of over 500 British and Irish novels from the 19th century. The results of this work show the precise extent to which Irish prose is stylistically different from English prose, and I identify and explore those linguistic and thematic features that mark the Irish novel as distinctly different from the British.

Specifically, my research examines style through an analysis of sentence and word level features. The results show, among other things, that Irish writers tend toward expressions that are both longer and more indeterminate than their British counterparts. Favoring the long sentence and greater use of the comma, the Irish write in comparatively complex, flowing sentences that favor (as measured by relative frequency) words denoting indeterminacy, words such as “most,” “some,” “may,” and “yet.” British writers, on the other hand, show a preference for shorter, more determinate sentences featuring words such as “know, never, no, nothing, must, not, only, all, should, last, first, and great.” This result tends to confirm anecdotal observations made by scholars, including (Cronin 1980) who suggest that though the Irish may have sought to imitate and appeal to the stylistic preferences of a British dominated industry, they ultimately invented their own style of prose, which captured both the rhythms of the local language and the anxieties of a country struggling with its position vis-à-vis the colonizing presence of the British.

In addition to probing and comparing the stylistic habits of the two nations, this work further analyzes the prose at the level of theme and argues that there is an important link to be made between style and theme in Irish prose. To harvest latent themes, I employed the unsupervised topic modeling tools of the UMASS machine learning toolkit (McCallum 2002). A run of the model, which sought to identify the 25 most prominent topics in the corpus, resulted in one particular topic appearing with greater frequency in the Irish novels of the corpus. This topic, which was labeled as “the big house theme,” is composed of words clearly relating to tenant-landlord relations and the familial issues that are so often explored by Irish writers attempting to characterize these troubled relationships. The big house theme was found to be the most prominent topic in 35% of the Irish novels analyzed in this corpus, and it is present to a lesser degree in many of the others.

My analysis concludes by tracing the links between distinctly Irish themes and the elements of Irish style identified in the first part of the research. From the macroanalytic data derived at the corpus level, I present a chronological charting of Fanning’s notion of linguistic subversion, and then I move to the micro level and offer a closer reading of several exemplary passages from works in the chronology. I discuss how linguistic subversion is inherent to the tradition of the "Irish Bull” and offer a brief discussion of Richard and Maria Edgeworth’s 1835 essay on the subject in which they write with some humor that: “English is not the mother tongue of the natives of Ireland; to them it is a foreign language, and consequently, it is scarcely within the limits of probability, that they should avoid making blunders both in speaking and writing . . . Indeed, so perfectly persuaded are Englishmen of the truth of this proposition, that the moment an unfortunate Hibernian opens his lips they expect a bull, and listen with that well known look of sober contempt and smug self satisfaction, which sufficiently testifies their sense of safety and superiority.” (Edgeworth 1835)

As early as Castle Rackrent (1800), Edgeworth had demonstrated her own command of linguistic subversion and an acute awareness of how to form her narrative and bend language to provide not simply a distinctly Irish novel but a seminal novel within the larger novelistic tradition. My work provides quantitative evidence of how, where, and why Irish style is different from British.

References:
Cronin, J. 1980 The Anglo-Irish Novel, Barnes & Noble Books Totowa, N.J

Edgeworth, M. 1835 Tales and Novels, Harper & brothers New York

Fanning, C. 2000The Irish Voice in America : 250 Years of Irish-American Fiction., Lexington, KY University Press of Kentucky

Flanagan, T. 1959 The Irish Novelists, 1800-1850, New York Columbia University Press.

Hawthorne, M. D. 1975 John and Michael Banim (The "O'hara Brothers") : A Study in the Early Development of the Anglo-Irish Novel, Romantic Reassessment., Institut für Englische Sprache und Literatur, Universität Salzburg Salzburg

Macdonagh, T. 1916 Literature in Ireland: Studies Irish and Anglo-Irish, T. F. Unwin. London

Andrew Kachites McCallum 2002 "Mallet: A Machine Learning for Language Toolkit.", (link)

Yeats, W. B. 1979 Representative Irish Tales, Humanities Press Atlantic Highlands, N.J.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None