A Computational Approach to Epistemology in Poetry of the Long Eighteenth Century: A Case Study in Objects and Ideas

paper, specified "long paper"
  1. 1. Mark Andrew Algee-Hewitt

    Stanford University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Poetry occupies a unique position in the computational study of literary text. Most recent computational linguistics work on poetry has focused on the analysis of meter or text generation (e.g. Lau et. Al 2018; Hämäläinen 2018). In cultural analytics, studies of poetry have focused primarily on sound studies (MacArthur et. Al. 2018) or meta-analyses of the context of its performance (Basnet and Lee 2021). In this paper, I leverage the unique textual and linguistic properties of poetry to explore how ideas are communicated poetically and whether this communication changes over time within a historical corpus in ways that respond to the shifting epistemologies of the Enlightenment and post-Enlightenment approaches to knowledge production. Given the interest in empiricism during the early eighteenth century, which shifts to an idealism-based philosophy during the early nineteenth century, I investigate whether the use of object-nouns (nouns representing material artifacts) and concept-nouns (nouns representing concepts) in poetry from this period was influenced by this change in how knowledge is produced and communicated.

For this project, I use the Chadwyck-Healey poetry archive, published by ProQuest. The final filtered corpus includes 55,293 poems written in English between 1700 and 1840.

Feature Selection
Previous work on concrete and abstract features has used word embeddings to find related terms (Heuser 2017), hand-curated lists of associated terms (Heuser and Le-Khac 2012), or found words with Latinate and Anglophone roots (Underwood 2012). For this project I also adopt embeddings to identify object-nouns and concept-nouns, but use a supervised learning approach. Beginning with a pre-trained GloVe word embedding model on the common core, consisting of 840 billion tokens representing a 1.2 million word vocabulary embedded within 300 dimensions (Pennington et. Al. 2014), I use the embedding dimensions as features to create a machine learning model to classify nouns into object and concept categories. Working from a full NLP dependency parse of the poetry corpus (coreNLP) to filter the embedding model for words used as nouns in eighteenth and nineteenth-century poetry, I trained a linear discriminant analysis to identify object and concepts words based on a 50 word seed list, retaining only those terms classified with a 75% or higher probability of being either an object (4418 nouns) or a concept (4843 nouns).

To explore the relationship between the use of object-nouns and concept-nouns in poetry across the long eighteenth and early nineteenth centuries, I first assess whether there is a difference between how these two classes of words are distributed within the poems of the period and whether this relationship changes over time. To compare the distribution of each word set, I calculated the dispersion index (variance/mean) for each word set in each poem (figure 1).
a. object words

b. concept words

Figure 1: a. dispersion index of object words in poems written between 1700 and 1834; b. dispersion index of concept words in poems written between 1700 and 1834 (individual poems are jittered on the x-axis for legibility within each period)
As the figures indicate, object words evidence significantly (pairwise Wilcoxon p-value of 2.2 e-16; Cohen’s d of 0.8) lower dispersion in poems written earlier in the eighteenth century than in the later eighteenth or nineteenth centuries. Conversely, concept words are much more significantly distributed (p-value 2.2 e-16, Cohen’s d of 0.7) throughout earlier poems. In other words, poems written during the Enlightenment distribute concept words throughout the poem, but cluster objects together, while poems written in the early nineteenth century cluster object words less, but cluster concept words to a greater extent. This potentially offers evidence that eighteenth-century poetic epistemology relied on empirical evidence (representing clusters of direct objects) more than nineteenth-century poetry.
To explore how these clusters playout within the poems themselves, I aggregated the number of object and concept nouns within each 50
th (using overlapping windows) of poems written during each of the three periods (figure 2).

a. object nouns

b. concept nouns

Figure 2: Aggregate position of a. object nouns and b. concept nouns within poems written during the early eighteenth century, the later eighteenth century and the early nineteenth century
Once again, the differences between both object and concept nouns, as well as between each time period, is apparent. Across all three time periods, object nouns fall across the poems, while in the two later periods, concept nouns are strongly clustered at the beginning and the end of their respective poems. This effect is similar for early eighteenth-century poetry, however the fall is much more dramatic, and the nadir occurs later in the poem. This suggests that concept words cluster within the frame of the poems from later periods, while they are more evenly used throughout poems from the earlier period.
As a final step, I calculate the most frequent verbs associated with all objects and all ideas (extracted from a dependency parse of the poetry). These associated verbs, I argue, can aid us in understanding how the changing dispersal and frequency of both ideas and concepts correlate with their use in the poems. The data reveals that across the eighteenth century objects become less the objects of “doing” and “knowing,” and more the objects of “seeing,” “hearing,” and “feeling.” Concept nouns remain more stable in relation to “thinking” and “knowing,” but become the objects of “being.”

The quantitative changes of the dispersal of nouns representing objects and ideas, as well as their average frequency across poems, refute the standard narrative of poetic evolution that emphasizes the object-oriented nature of Romantic period poetry. Instead, following the science-oriented epistemology of the Enlightenment, eighteenth-century poems used organized clusters of objects in order to inductively develop and support emergent ideas, while in Romantic period poetry, objects are scattered throughout the poems in support of the centralized and organized ideas that animate the poem.

Basnet, Anik and James Jaehoon Lee. “A Network Analysis of Postwar American Poetry in the Age of Digital Humanities.”
Journal of Cultural Analytics 4(2021): 180-217.

Hämäläinen, Mika. "Poem machine-a co-creative nlg web application for poem writing." 
The 11th International Conference on Natural Language Generation Proceedings of the Conference
. The Association for Computational Linguistics, 2018.

Heuser, Ryan “Word Vectors in the Eighteenth Century” ADHO Conference, 2017.
Heuser, Ryan and Long Le-Khac. “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method.”
Pamphlets of the Stanford Literary Lab 4(2012).

Lau, Jey Han, Trevor Cohn, Timothy Baldwin, Julian Brooke, and Adam Hammond. "Deep-speare: A joint neural model of poetic language, meter and rhyme." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics 2018.

MacArthur, Merit, Georgia Zellou, and Lee M. Miller. “Beyond Poet Voice: Sampling the (Non) Performance Styles of 100 American Poets.”
Journal of Cultural Analytics (2018): np.

Pennington, Jeffery, Richard Socher, and Christopher D. Manning.
GloVe: Global Vectors for Word Representation. 2014.

Underwood, Ted. “Etymology and nineteenth-century poetic diction; or, singing the shadow of the bitter old sea.”
The Stone and the Shell 2012.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO