Willa Cather enunciated the theoretical foundations of her writing style in her 1922 ars poetica essay “The Novel Démeublé” (Transl. “The Novel Unfurnished”). In this work, Cather calls for a realistic writing style that is rooted in an economy of prose still rich in suggestion and emotions. Cather describes a writing style capable of creating a bare scene where literalness ceases and where readers can detect the presence of what she calls “the thing not named.” Such expositions prompt us to ask questions about what it means to unfurnisha literary piece and to wonder if the writing style Cather described in this seminal essay is present in her own fiction—either throughout her entire fictional corpus, or only after her manifesto.
These questions provide the impetus for our computational study of Cather’s fiction, and the results of our analysis provide insight into the question of how we evaluate whether the author’s production is congruent with the author’s ars poetica. Starting from Cather’s declarations in “The Novel Démeublé,” we defined the criteria for a lexical and syntactical analysis of her fiction. Our analysis indicates that Cather’s writing style remained consistent across her corpus both before and after her stylistic declaration in “The Novel Démeublé,” providing new answers to the long-standing scholarly debate as to whether Cather ascribed to the ideas in the essay.
In this work, we use high frequency words and selected parts of speech to explore whether a chronological shift from a more ornate style to a minimalist style can be detected across a career spanning forty-eight years. Our team utilized over sixty examples of Willa Cather’s fictional work currently housed at the Willa Cather Archive in XML format. The textual data was composed of two “genre sets.” The first set included all of Cather’s novelistic fiction and the second, Cather’s short story fiction. Using R (R Core Team, 2013), our team parsed, tokenized, and POS tagged all of the XML into word and parts of speech frequency tables which could then be studied chronologically and in isolation.
Using this derivative data, our team specifically studied Cather’s use of determiners, adverbs, adjectives, and personal pronouns as a way of measuring Cather’s idea of unencumbered prose. A more “unencumbered” prose, we hypothesized, would be less ornate and utilize fewer of these markers. To facilitate this investigation, we calculated the mean for each selected parts of speech and used those means as a basis for further examining outliers within the long form and short form fiction. Among other things, our team compared the part of speech frequencies between Cather’s long form fiction (Fig 1) and short form fiction (Fig 2) and then specifically analyzed pronouns (Fig 3), determiners (Fig 4), and adverbs within the respective corpora (Fig 5).
Fig. 1: Parts of Speech in Death Comes to the Archbishop with Cather’s Long form fiction (LFF)
Fig. 2: Parts of Speech in Tale of the White Pyramid with Cather’s short form fiction (SFF)
Fig. 3: Use of top five personal pronouns in Tale of the White Pyramid against the mean in Cather’s short form fiction (SFF) and in Death Comes to the Archbishop versus the mean in Long form fiction (LFF)
Fig. 4: Use of top two determiners in Tale of the White Pyramid against the mean in Cather’s short form fiction (SFF) and in Death Comes to the Archbishop versus the mean in Long form fiction (LFF)
Fig. 5: Use of adverbs in Tale of the White Pyramid against the mean in Cather’s short form fiction (SFF) and in Death Comes to the Archbishop versus the mean in Long form fiction (LFF)
For our analysis of the high frequency words, we created a minimum mean threshold and examined those words within a minimum relative frequency across the corpus of 0.5. This threshold had the effect of excluding context sensitive words from the analysis. All of this derivative data was merged with metadata from the texts and then explored and organized using the Euclidian metric as a basis for a hierarchical clustering. The clustering allowed us to investigate the similarities between sixty-six samples of Cather’s fictional writing. A dendogram (Fig 6) shows the results of this clustering.
Fig. 6: Word Frequency Clustering of Entire Fictional Corpus
Our analysis of Cather’s use of specific parts of speech and high frequency words indicated no significant change in her usage patterns over time. Our results suggest that the style Cather advocated in “The Novel Démeublé” is the style that she employed throughout her career. Despite a generally stable signal over time, though, two outliers emerged: Death Comes to the Archbishop and A Tale of the White Pyramid. The former was a distant outlier within a closed set of her novelistic fiction but was found to be consistent with her shorter fiction. The latter was the sole outlier of the entire corpus, which is not surprising given that Cather wrote it as a student.
E. K. Brown (2003) "Homage to Willa Cather," in Willa Cather Critical Assessments, Guy Reyonolds, ed. Mountfield: East Sussex. Vol. I
Maciej Eder“Stylo R Package.”sites.google.com/site/computationalstylistics/stylo
David Hoover (2003), “Another Perspective on Vocabulary Richness,” Computers and the Humanities, 37:15`-178.
Mary Ann O’Farrell (2005), "Words To Do with Things", in Willa Cather and Material Culture, Janis P. Stout, ed., University of Alabama Press: Tuscaloosa.
David Oshinksy (2007), “No Thanks, Mr. Nabokov,” New York Times, September 9. www.nytimes.com/2007/09/09/books/review/Oshinsky-t.html?_r=0.
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, www.R-project.org.
Lionel Trilling (2003), "Willa Cather,"in Willa Cather Critical Assessments, Guy Reyonolds, ed. Mountfield: East Sussex. Vol. I.
This research began in a class taught by Matthew Jockers and has continued under his direction as a project of the Nebraska Literary Lab.
See: Willa Cather Archive. “The Novel Démeublé” cather.unl.edu/nf012.html. First published in the New Republic, 30 (April 12, 1922): 5-6.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne
July 7, 2014 - July 12, 2014
377 works by 898 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)
Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/
Attendance: 750 delegates according to Nyhan 2016
Series: ADHO (9)