Digital Animal Studies: Modeling Anthropomorphism in Animal Writing, 1870-1930

paper, specified "long paper"
  1. 1. Victoria Googasian

    Stanford University

  2. 2. Ryan James Heuser

    Stanford University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Is an animal a person? The question is far from idle; it is in fact fraught with urgent ethical and legal consequences for both animal and environmental rights and shapes scientific norms for the study of animal behavior. It remains a constant theme in Western philosophy from Rene Descartes through present-day Critical Animal Studies. However, lawyers, philosophers, and ethologists are not the only deciders in this question: cultural representations of animals also mediate their relation to personhood. Fiction, for instance, excels in the representation of human individuality, interiority, and action; complex, “round” characters of course populate the long history of prose fiction. How, then, does fiction engage with the personhood of animals? In fiction, is an animal a
character? What do animals do in the pages of fiction? Do they make decisions, have feelings, express interiority? Do animals function more similarly to human characters, or to things, objects, and machines?

In what follows, we approach these questions with computational methods in one of the first attempts to apply digital methods to animal studies.

For an experiment on biodiversity archives and species extinction, see Ursula Heise,
Imagining Extinction: The Cultural Meanings of Endangered Species
(Chicago: University of Chicago Press, 2016), 55-86.

In a variety of corpora—from popular natural history to scientific writing about animal behavior to animal-driven fictions historically accused of anthropomorphism—we compare the semantic and syntactic footprints left behind by animals and humans. We discover that, from a computational standpoint, animals in fiction are indeed recognizable as characters, albeit characters who register intentionality through physical movement over speech and display a mental paradigm delimited by instinct and associative learning. Natural history writing, on the other hand, narrates animals in ways that seem surprisingly human-like when compared to animal representations in fiction more broadly.

Around the turn of the twentieth century, reading audiences in the United States developed a penchant for fiction about nonhuman animals. These so-called “wild animal stories” appeared in popular magazines and short story collections by writers such as E.T. Seton, Charles G.D. Roberts, William J. Long and Jack London. Despite their popularity, the stories proved to be a flash-point for scientific controversy over the appropriate way to narrate animal behavior. Over a period of several years, prominent public intellectuals—including the naturalist John Burroughs and then-president Teddy Roosevelt—would attack the writers of wild animal stories for attributing inaccurately human-like qualities to their nonhuman characters. The entire debate would come to be known as “The Nature Fakers” episode.

For a detailed literary-historical account, see Ralph Lutts,
The Nature Fakers: Wildlife, Science, and Sentiment
, Fulcrum Publishing, 1990.

Though concerned a minor and moderately pulpy literary sub-genre, the controversy marks a moment when science and fiction butt heads over the meaning and appropriateness of anthropomorphism, as natural historians attempt to police the limits of fictional character. This corpus thus offers a high-stakes proving ground for the very concept of anthropomorphism in narrative fiction and science writing.

We have assembled 54 texts from the writers involved in this controversy, comprising short story collections and novels by eight of the most prominent animal story authors, published between the 1870s and the 1930s and concentrated in the first two decades of the twentieth century when the Nature Fakers debate was at its most intense. As a point of comparison for these fictions, we have also assembled 17 of John Burroughs’ natural history monographs, which detail the doings of wild animals in the idiom of popular science. Finally, we selected approximately 400 American novels published between 1870 and 1930 as a control corpus with no particular interest in nonhuman animals.

These 444 novels derive from two sources. For the late nineteenth century, we turned to a collection of about 325 American novels published between 1875 and 1905. Compiled by Marissa Gemma, these texts were selected based on their inclusion in the
Annals of American Literature
(Richard Ludwig and Clifford Nault, Oxford: Oxford University Press, 1989) and their availability in Project Gutenberg. Twentieth-century novels are those compiled by Mark McGurl and Mark Algee-Hewitt in “Between Canon and Corpus: Six Perspectives on 20th-Century Novels,”
Stanford Literary Lab
8 (2016),

Our digital methods collect the words that characterize animals, humans, objects in fiction. We base our methods on BookNLP, a Java program which clusters fictional character names together (“Elizabeth” and “Elizabeth Bennet”), and then collects the words associated with each character in specific ways: verbs for which each character is either a subject (“Elizabeth wondered”) or an object (“Mrs. Gardiner ... reminded Elizabeth”); as well as the nouns each character possesses (“Elizabeth’s wishes”), the adjectives attributed to each character (“Elizabeth was watchful”), and all words spoken by the character in moments of dialogue.

David Bamman, Ted Underwood, and Noah Smith, “A Bayesian Mixed Effects Model of Literary Character,”
ACL 2014
, For a recent application of BookNLP to historical fictional practices of gendering human characters, see the recent article by Ted Underwood, David Bamman, and Sabrina Lee, “The Transformation of Gender in English-Language Fiction,”
Cultural Analytics
(Feb 2018), DOI: 10.31235/

We applied BookNLP to our corpus of wild animal stories, and then annotated its returned characters for whether they were human or animal in 15 of the cleanest texts. BookNLP was in general accurate at identifying named animal characters—for example, in London’s
White Fang, BookNLP failed to recognize only one repeatedly occurring named character, White Fang’s father One Eye. The results of our annotations are recorded below in Table 1.

#Words in stories
#Chars (Animal)
#Chars (Human)
#Words (Animal)
#Words (Human)
#Words (per char)
%Effect on Totals

Ernest Seton

William Long

Charles Roberts

Harriet Miller

Jack London

Clarence Hawkes

Table 1
. Statistics regarding the small corpus of 15 texts whose BookNLP-identified characters have been annotated for their species (whether human or animal). Sorted by the number of stories they contribute, these authors vary widely in the number of word-to-character associations they generate. For instance, although William Long contributes more stories, words, and characters than Jack London, his effect on the aggregate data is smaller (6% to London’s 35%): this is because he attributes on average only 56 words per character, whereas London attributes on average 199 words to each character. This is likely owing to Long’s heavier usage of non-proper nouns, often referring to characters as “the mother,” “the cub,” etc.

BookNLP has the obvious limitation of overlooking unnamed characters—both human and animal alike. To address this limitation, we designed a Python program to extend BookNLP’s logic to all nouns. Our program collected approximately the same information: the verbs of which each noun is a subject or object; the nouns it possesses; and the adjectives it is modified by.

Our Python program ran the corpus through spaCy, a syntactic dependency parser for Python, and collected all moments when a noun had a syntactic relation to another word in a sentence of any of the following kinds (shown in parentheses are the syntactic dependency tags): subject (nsubj); passive subject (nsubjpass); direct and indirect objects (dobj and dative); possessives (poss); and modifiers (amod [adjective], compound [noun compound], appos [appositive], and attr [predicate]). We used SpaCy's default English-language model (en_core_web_sm), trained on web discourse. That this model was trained on the hyper-modern form of language of the internet is of course regrettable, given the historical material of our project, but unfortunately this is common and largely unavoidable problem in literary text mining.

We then produced a lists of nouns for animals and humans, which were drawn from the Harvard General Inquirer lists of those names.

Roger Hurtwitz, “General Inquirer Home Page,” (accessed July 22, 2018). Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith and Daniel M. Ogilvie,
The General Inquirer: A Computer Approach to Content Analysis
(Cambridge: MIT Press, 1966).

Uncertain or ambiguous entries were pruned, reducing an original list of 930 words to a refined list of 371 words.

For example, the refined list of 287 human words included words like detective, ambassador, pope, freshmen, executive, commoner, manager, scientist, and human; the refined list of 96 animal words included words like turtle, owl, shark, oxen, grouse, roachback, moth, crow, hare, and jackrabbit.

Whenever a noun from these lists appeared in the parsed sentences of the corpus, we record its appearance along with its associated syntactic relation.

Finally, to determine whether a word is (statistically) significantly more likely to associate with a human or animal character, we conduct a Fisher’s exact test on a 2x2 contingency table: (# of times a given word appears in the context of an animal/human character) x (# of times an animal / human character appears.) This weights our expectation for the number of times words appear by the number of times characters appear. The Fisher’s exact test returns an odds ratio (the odds of a word’s human association / its odds of animal association) along with a p-value. Words with a p-value of less than 0.1 are shown in the results below.

Due to limited space, we present only a brief summary of some of our findings in experiments that draw on BookNLP and named characters (Experiment 1) and on our own program’s collection of words associated with animal and human nouns (Experiment 2). Finally, we apply machine learning techniques to assess the overall distinctiveness of human and animal nouns (Experiment 3).

Experiment 1: Comparing how animal and human characters are narrated in the Wild Animal Stories using Book NLP
As seen in Figure 1, human characters are more likely to be the subject of speaking verbs such as “asked,” “called,” “cried,” “announced,” and, most especially, “answered.” Even in these wild animal stories, animals do not (as a rule) speak. Instead, animal characters in these texts are far more likely than humans to register intentional response through embodied activity--such verbs as “fought,” “bristled,” and “licked.” Emotionally, these results suggest that animals are characterized more by aggression (“hated”), humans by sociability (“love”).

Think of London’s dog heroes, who only learn to love toward the end of their stories, when paired with the appropriate human master, but who never experience love in the company of other dogs.

More subtly, animals “learn” and “know” things, while a human character is more likely to have “thought,” a result that could suggest something about the prevailing paradigm of animal mental life, characterized by instinct and associative learning, rather than rational reflection.

Figure 1: The verbs for which animal and human characters most often act as subject. Shown are words that are statistically significantly distinctive of animal or human characters (p<0.1). The farther to the left, the more strongly a word is associated with animal characters; the farther to the right, the more to human characters.

Experiment 2: Comparing how animal nouns are narrated differently in Wild Animal Stories and Natural History writing
Our comparison of Burroughs’ natural history writings with the Wild Animal Stories does not, on the face, suggest that animals are any less person-like in Burroughs anti-anthropomorphic paradigm than they are in fiction. Indeed, as seen in Figure 2 below, Burroughs’ animals are statistically more likely to “know,” “make,” and “become” than their fictional counterparts. Many of the most distinctive active verbs between these two corpora might be explained by their relative interest in different species--for example, Burroughs writes about birds more frequently, so his animals are more likely to “fly” and “sing.” On the other hand, the strong influence of Jack London in the fictional corpus results in a high incidence of sled dogs, who are statistically more likely to “draw” a sled.

Figure 2: The verbs for which animal nouns most often act as subject. Shown are words that are statistically significantly distinctive of wild animal stories or of natural history writing (p<0.1). The farther to the left, the more strongly a word is associated with the former genre; the farther to the right, the more to the latter.

Experiment 3: Classifying human and animal nouns in wild animal stories, natural history, and novels
The computer had only a slightly harder time distinguishing between humans and animals in the fictional corpus than in Burroughs’ natural history texts, with median accuracies at 70% and 71% respectively. Comparatively, when fiction with no particular interest in non-human animals is added to the mix, the computer was able to distinguish humans from animals with a median classification accuracy of 82%. This discrepancy suggests that, contrary to the terms of the Nature Faker debate, both wild animal stories and natural history construct a semantic similarity between humans and animals.

Figure 3. Classification accuracy rates to determine whether a noun is a human or an animal. An identical number of instances of human and animal words were sampled from each genre (8,800). All human and animal words with more than 10 instances were used, leaving anywhere from 97 to 145 words depending on the corpus. In one hundred runs per genre, 30 animal words and 30 human words were randomly sampled. These were classified for their species (animal vs. human) by a logistic regression, trained and tested according to a leave-one-out classification model.


Heise, U.
Imagining Extinction: The Cultural Meanings of Endangered Species
. Chicago: University of Chicago Press, pp. 55-86.

Lutts, R.
The Nature Fakers: Wildlife, Science, and Sentiment.
Fulcrum Publishing. Charlottesville, VA: University Press of Virginia.

Bamman, D. et al.
(2014). A Bayesian Mixed Effects Model of Literary Character.
Association for Computational Linguistics

Underwood, T. et al.
(2018). The Transformation of Gender in English-Language Fiction.
Cultural Analytics

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO