Digital Humanities and the Canon

paper, specified "short paper"
  1. 1. Nathaniel Allen Conroy

    Brown University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Can the digital humanities offer an alternative to traditional modes of canon formation? This paper argues that quantitative methods can both enrich our understanding of the way canons are formed and help us create more flexible and interactive alternatives to traditional canons. Over the last year, I’ve built a tool for generating dynamic literary canons and placed it in public beta at Metacanon measures the canonicity of literary works by calculating the number of times they are mentioned in scholarly journals and using an algorithm to assign a uniform score to each work based on this data. While this certainly does not amount to a measure of aesthetic value or "greatness," it does offer a concise snapshot of the body of literary works that are most discussed by scholars. Currently, it only covers twentieth century American fiction, but future versions will be expanded to include other genres, periods, and nationalities. For scholars, this will provide a tool for quickly measuring the relative centrality or obscurity of particular works as well as a tool for measuring how canons change over time. For students and the general public, it will offer a far more inclusive, flexible, and interactive alternative to the fairly predictable greatest books lists that currently act as arbiters of literary value outside of academic circles.

In the wake of Pierre Bourdieu’s Distinction (1979), literary studies has developed a nuanced critical apparatus for rethinking the role played by the canon and canonicity in the perpetuation of cultural capital. Our current scholarly common sense insists that far from reflecting aesthetic value, canons actually create this value socially, often thereby reinforcing hegemonic cultural values and hierarchies. And yet, even as we know this, the actual collection of texts that is consistently taught, written about, and by extension canonized remains relatively stable. By examining the frequency with which particular works are mentioned in various scholarly networks, Metacanon creates an accessible representation of this trend and in doing so introduces a greater level of transparency into the dominant allocation of literary values. In doing so, this project is similar in some respects to work being done by Mark Algee-Hewitt and Mark McGurl at the Stanfard Literary Lab, although using different means and with different ends in mind. Whereas Algee-Hewitt and McGurl have produced a master list of 350 twentieth century novels by combining several “found lists” supplemented by a survey of scholars working in the field of postcolonial literature, Metacanon uses an approach that takes advantage of a wider array of harvested data drawn from thousands of journal articles. This reflects the very different goals of this project. Rather than producing a necessarily limited corpus suitable for datamining, Metacanon is primarily intended as an exhaustive but flexible representation of the canon. This allows for a much larger interactive list, but I‘ve also found that the Metacanon list is much more diverse in terms of gender (and likely ethnicity) than the McGurl and Algee-Hewitt list, even though they consciously aimed to create a corpus that would be more representative than most standard lists. What this indicates is that although most publicly available “greatest books lists” tend to over-represent white men in their construction of literary value, scholars themselves tend to work on a much more diverse array of literary texts. In other words, there is already a working canon in existence that is much more diverse and representative than the standard lists and surprisingly more so even than Stanford’s intentionally varied list; it’s just that this working canon isn’t generally available in an accessible, objectified form. Metacanon takes the first steps toward producing this more accessible form, even as it integrates flexibility and transparency into its framework.
The current version of Metacanon (0.6) is limited to twentieth century American fiction. As such, however, it is the most comprehensive relational database of American fiction from this period. Of course, there are more extensive listings of American literature available (for example, the Chadwyck-Healey Bibliography of American Literature). However these offer no way to easily distinguish between highly canonical works and more obscure works. In essence this forces readers looking for a definitive list of American fiction to choose between the unwieldiness of comprehensive bibliographies and the partiality of much shorter “greatest books” lists and standard field lists. What makes Metacanon unique is that it harnesses digital technology in order to offer both the expansiveness of a comprehensive bibliography while at the same time measuring the relative centrality or obscurity of each particular work.
This digital framework also allows users to become active participants in the construction of the canon rather than merely passive recipients. For example, one user might choose to see a list of the most canonical novels published between 1970 and 1979, or even more interestingly the most canonical novels of the 70s according only to data from the 80s or 90s. Another user might choose to see a list consisting only of science fiction written by women. A third user might choose to alter the algorithm to calculate canonicity based only on citations in a single journal. Contrary to the widespread fear that digital or quantitative approaches to literature are fundamentally opposed to nuance and flexibility, Metacanon demonstrates that this need not be the case, at least in so far as questions of canon formation are concerned.
While most of what is written above concerns Metacanon’s value as a public humanities initiative and as an aid to students, this project has growing implications for literary scholarship more broadly. One of the most fascinating lines of inquiry in the digital humanities today is the use of quantitative textual analysis to trace the connections between literary form and reception over time. For example, scholars like Richard Jean So, Hoyt Long, Ted Underwood, and many others have used digital text mining to demonstrate relationships between particular formal features of literary texts and the social categories that govern their movement through the world, such as attributed aesthetic value and genre. As the precision of Metacanon’s measurement of canonicity improves, researchers could use this data along similar lines to ascertain connections between literary form and canonicity over time.


Algee-Hewitt, M. and McGurl, M. (2015). Between Canon and Corpus: Six Perspectives on 20th-Century Novels.
Stanford Literary Lab Pamphlet, 8. (accessed 1 March 2016).

Long, H. and So, R. J. (2016). Literary Pattern Recognition: Modernism between Close Reading and Machine Learning.
Critical Inquiry, 42(2): 235-67.

Sellers, J. and Underwood, T. (2016). The Longue Durée of Literary Prestige.
Modern Language Quarterly, 77(3).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website:

Series: ADHO (11)

Organizers: ADHO