Machine Learning for Literary Criticism: Analyzing Forms, Genres, and Figurative Language

paper, specified "short paper"
  1. 1. Michael Ullyot

    University of Calgary

  2. 2. Adam James Bradley

    Ontario Tech University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction / ImportanceSince literary critics began using topic models on large text corpora, we perceive literary periods as more fluid (Underwood, 2013; Pressman, 2014) and subgenres as more dynamic (Jockers, 2013; Underwood, 2014). These advances are mostly concentrated in prose fiction. Prose is more straightforward and verbose than poetry (Rhody, 2012), even if the problems of poetics are tractable for forms from Victorian sonnets to free verse (Houston, 2015; Bories et al., n.d.). Poetry is rarely straightforward: it uses words that resonate with other words, that complicate ideas and change meanings, that are there for idiomatic, rhythmic, allusive, formal, tonal, thematic, semantic, or idiosyncratic reasons. In sum, there are so many reasons that poets use particular words that machines struggle to model their topics statistically.We use a recurrent neural network (RNN) for classifying sonnets, which are formally defined (14-line rhyming poems) but which also exhibit generic qualities of arguments, subjects/topics, tones, moods, and forms of address. We have built a computational model capable of scoring any text for its formal and generic resemblance to accepted criteria, for scoring its “sonnetness.” Our goal is to find poems that have the generic features of sonnets, but not the formal criteria like a Petrarchan or Shakespearean rhyme scheme. These results will address our core question: to what degree sonnets, both individually and as a category, are defined formally or generically.MethodsThe standard distinction between Petrarchan and Shakespearean sonnets is based on rhyme schemes, but we set out to see if machine learning could define features that we couldn’t see. We began with diction, or word choices that constitute both form and genre; the results were so promising that we extended the dimensionality of our model to incorporate four other dimensions: sound, rhyming, punctuation, and lineation. This identified a set of poems that we would never have considered.Results / DiscussionIn this presentation we will address why we began with early sonnets, which set conventions to which later English sonnets respond. We moved from a hand-transcribed test set to a corpus of 253,000 English-language poems from 12 centuries. Now we are expanding to two larger corpora: the 70,000 English texts printed before 1700, in the Early English Books Online - Text Creation Partnership (EEBO-TCP) corpus; and to the 334,000 volumes of literature in the HATHI Trust Digital Library.In our past work (Ullyot and Bradley, 2018), we concluded that exceptions to the rules make language poetic. Poetry is deliberately irregular. It does not obey rules, it sets and then resets them. By expanding the canon of sonnets, our current project will unsettle critics’ orthodox ideas about them.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO