University of California - University of California, Santa Barbara
From Bakhtin’s heteroglossia to Genette’s transextuality, theories of the novel describe its form as a mixture of diverse elements or styles. While many of these mixtures are contiguous with the text as a whole (a novel, for example, can operate in both the genres of history and the gothic), in this project, we are interested in overt stylistic changes within the text of a novel that mimic the discourse of non-novel genres like psychology, philosophy, or natural science—what we call here shifts in
microgenre. For example, in Edward Bellamy’s 1888 novel,
Looking Backward, the first-person narration of the time-travel story transforms key moments into third-person voices that explain to the narrator (and the reader) how Boston’s future society operates. In Stoker’s
Dracula, the same kind of shift takes place through the overt inclusion of multiple media forms, such as newspaper articles and excerpts from the
Demeter’s log. More subtle is the sermon embedded in the third chapter of Joyce’s
Portrait of the Artist, delivered in the voice of Father Arnell. In each of these cases, there is a strong shift in the style of the text towards the discourse of other disciplines (history, in the case of Bellamy, journalism in Stoker and theology in Joyce). In this project, we use computational methods to identify the stylistic differences between disciplinary writing that enables this phenomenon and to reveal when and how these microgeneric shifts take place within the framework of a novel.
Corpus and Methods
While the phenomenon that we describe above is present throughout the history of the novel, and in many different novelistic traditions, we restrict our inquiry in this paper to the period between 1880 and 1914. Not only does this enable our investigation to control for the historical change in both disciplinary discourse and fiction, but the late nineteenth century is also a critical period in the history of modern disciplinarity. In both America (with the second Land Grant act for universities in 1890) and in the United Kingdom (with the rise of Civic Universities beginning in 1900), this period saw a massive growth in university education and, with it, in the number of disciplines concretized through their inclusion in the academy. For example, the subjects offered at Mason Science College (later, the University of Birmingham) in 1880 included only Math, Physics, Chemistry and Biology. By 1898, this had grown to 21 subjects, including Literature, Physiology, Philosophy and Engineering. By concentrating our research on this period, we are able not only to find evidence of disciplinary styles within the novels written at the time, but also to recover the stylistic differences that accompanied this growth and division of disciplinary writing.
To explore both areas, we hand-assembled a corpus of 97 texts written between 1880 and 1914. This corpus included 10 texts each from a range of disciplines, including Anthropology, Economics, Philosophy, Psychology, History, Politics and Natural Science. These texts were chosen by the project participants to be representative of the range of possible discursive styles of each discipline and to be of a length similar to the novels of the period. The corpus also includes 27 novels (again written between 1880 and 1914), which both served as an example of the narrative discursive style for our model, and also allowed for further classification of the various disciplinary styles as they are embedded in these texts.
The theory of microgeneric shifts that we propose here is similar to the multi-authorial problem addressed in stylometric work by rolling delta. Like rolling delta, our concern is also with the linear analysis of sequential phenomenon; however rather than intra-textual changes in authorship, our project seeks to identify stylistic shifts within a narrative written by a single author, but which reveal the presence of extra-disciplinary discourses. For this reason, our approach builds on the work of stylometry, but departs in two important ways. First, because we are interested in shifts in discourse rather than content, we have opted to take a non-lexical approach. Rather than a feature set built from a subset of token frequencies (like most frequent words), we use grammatical features in our model, including the frequency of the Penn Treebank POS tags, the average sentence length and the average number of clauses per sentence. In a lexical approach, a model built on the most frequent words, as in the calculation of delta, is too tied to authorial signals, while a model built on distinctive words would be too indicative of the subject matter, rather than the style, of the individual disciplines. Secondly rather than a cluster-based distance approach, we adopt a classification model that builds upon research into literary genres using computational methods. By training a model on 200-sentence subsections of our corpus of disciplinary texts and then classifying similarly sized passages of contemporaneous novels, we can use the posterior probabilities of the classification results to identify the mixture of disciplinary writing in each part of each novel.
The project seeks to answer two related questions. First, are there differences in the discourse of disciplines that are stylistic rather than semantic? And, second, if these differences in style can be identified, is it possible to detect evidence of these styles embedded within novelistic prose?
To answer the first question, we created a classification model using a discriminate function analysis. Our choice of a method spoke to both our hypothesis that the classification of disciplinary style would be found in a linear combination of grammatical features, as well our subsequent use of the posterior probabilities calculated by the model (rather than hard classifications) as proxies for the disciplinary mixtures in each of the passages that we classified. The results of our initial classification (cross validated through a withheld test sample) reveal that our model was able to classify the texts from each discipline at a far greater percentage than chance (Figure 1). Even the most ambiguous discipline, economics, was classified correctly more than 50% of the time, far greater than chance (12.5%—equal for all fields, since we down-sampled during training). The misclassifications in each genre speak both to the complexities of the model and to the same microgeneric phenomena that occur in novels. In a book on natural science, for example, the straightforward discourse of history includes aspects of natural science in its attention to the details of historical events.
Figure 1: Discriminate Function Analysis of corpus using grammatical features
For the second question, we used the model we had developed to classify individual parts of novels. For example, in the resulting graph of Sir Arthur Conan Doyle’s
A Study in Scarlet (Figure 2), each bar represents a 200-sentence slice of the text, and the mixture of colors indicate the posterior probabilities of each disciplinary discourse in each slice as assigned by the model. Although the majority of slices are, as one might expect, dominated by novelistic discourse, the eighth slice is divided between anthropology and natural science. In fact, that portion of the novel is given over to a description of the geography of the southwest United States, followed by details about the appearance and customs of Mormon migrants—in other words, something like natural science and anthropology. This is an intuitive confirmation that disciplines have discursive styles (remember that the classifier has captured this mixture with no semantic information), and that novels can deploy them in strategic ways. Examples like this suggest not just the technical accuracy but also the literary critical use of microgenres, especially for more densely interdisciplinary works, like Bellamy’s
Looking Backward (
Figure 2: Classification of 200 sentence slices of
A Study in Scarlet
Figure 3: Classification of 200 sentence slices of
The results of our project reveal that the disciplinary discourses of the late nineteenth century did indeed have unique formal stylistic markers that extend beyond their subject matters or word choices. Moreover, by classifying parts of novels using a model built on these features we can trace how different disciplinary styles were embedded as microgenres within the narratives of nineteenth century prose writing.
Algee-Hewitt, Mark, Laura Eidem, Ryan Heuser, Anita Law, Tanya Llewellyn (2014). “The Cryptic Novel: A Computational Taxonomy of the Eighteenth-Century Literary Field.”
Abstracts of the Annual ADHO Digital Humanities Conference.
Bakhtin, Mikhail. (1981)
The Dialogic Imagination: Four Essays. Trans. Caryl Emerson and Michael Holquist Austin: The University of Texas Press.
Eder, Maciej. “Rolling stylometry.” (2016)
Digital Scholarship in the Humanities 31.3: 457-469.
Genette, Gérard. (1997)
Palimpsests: Literature in the Second Degree. Trans Channa Newman and Claude Doubinsky. Lincoln: University of Nebraska Press.
Rybicki, Jan, David Hoover and Mike Kestamont. (2014) “Collaborative authorship: Conrad, Ford, and Rolling Delta.”
Digital Scholarship in the Humanities 29.3:422-431.
Taylor, Anne, Mitchell Marcus, and Beatrice Santorini. (2003) “The Penn Treebank: An Overview”
Building and Using Parsed Corpora. ed. Abeille, A. Springer.
Underwood, Ted. (2017) “The Life Cycles of Genres”
The Journal of Cultural Analytics
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Utrecht University
July 9, 2019 - July 12, 2019
436 works by 1162 authors indexed
Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html
Series: ADHO (14)