Abstract Values in the 19th Century British Novel: Decline and Transformation of a Semantic Field

  1. 1. Long Le-Khac

    Stanford University

  2. 2. Ryan James Heuser

    Stanford University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Abstract Values in the 19th Century British Novel: Decline and Transformation of a Semantic Field
Le-Khac, Long, Stanford University, llekhac@stanford.edu
Heuser, Ryan, Stanford University, heuser@stanford.edu
This paper analyzes the historical behavior of several semantic fields of “abstract values” in a corpus of 2,779 19th century British novels. The corpus is a composite archive of canonical and non-canonical texts drawn from Project Gutenberg, Internet Archive, and Chadwyck-Healey’s 19th-century Fiction Collection. In his classic study, Culture and Society, Raymond Williams claimed that a group of keywords that arose and/or changed dramatically in the nineteenth century offered “a special kind of map [of the] wider changes in life and thought” of the age (Williams 1958). We develop Williams’s insight by applying quantitative methods to a much larger corpus than available at the time of his study. Using a tool we built specifically for our research, we were able to aggregate words whose historical frequencies follow similar trends, thus identifying particularly dynamic semantic cohorts; from these, we found dramatic declines and transformations in fields of social restraint, moral valuation, sentiment, and partiality over the nineteenth century – and an equally dramatic increase in the use of concrete description fields in the same period. We examine the implications of these findings with respect to broader ideological and narrative patterns of the British novel.

In prior applications of semantic fields to quantitative literary studies, researchers have tended to measure the relative presence of certain fields, “themes” or otherwise-labeled word-groups in individual texts (e.g. Louwese 2004; Ide 1989; Fortier 1989). We hope to complement such comparative work by tracing the diachronic behavior of particular semantic fields across a corpus of nineteenth-century novels. In addition, we aim to specify our theoretical object of the semantic field more precisely, both by developing our fields through an empirical method of word-cohort correlation, and by grounding them in their original conceptualization by early twentieth-century semantics. In “Bedeuntungssysteme,” R. N. Meyer influentially defined a semantic field as “the ordering of a definite number of expressions from a particular point of view”—or in other words, from a particular “differentiating factor” (Meyer 1910). To borrow Meyer’s example: the sense of purposefulness, present in the transitive verb ersteigen(to climb) but not in the intransitive verb steigen(to rise), could serve as the “differentiating factor” around which a particular Bedeuntungssystemderived its identity.

In its period focus and objectives, this project is indebted to Raymond Williams’s Culture and Society, which analyzes the historical semantics of a period of unprecedented change for Britain. He contends that changes in discourse help reveal broader sociocultural changes. These wider changes, he argues, are of no small consequence; indeed, they introduced many of the social elements and ideas central to what we now think of as distinctive to our modern way of life (Williams 1958). Of course, Williams’s ambitious attempt to analyze an entire social discourse, astonishing as it is, lacked the tools and corpora now available to digital humanities scholars. This paper represents some first steps in pursuing Williams’s objectives by applying quantitative methods to a large novelistic corpus in order to explore specific but dramatic changes in language and culture in this volatile period.

Our method of field-creation consists of two stages: discovery and development. First, to discoverpotential fields, we developed a technique of word-cohort correlation. We input “seed” words considered significant by previous literary historical work and query the corpus for words whose historical frequency-trends most resemble those of the “seed” words. When this automatically generated cohort of correlated words shares a specifiable differentiating factor and their overall trend is significant, we consider such a word-cohort the embryo of a dynamic semantic field ripe for development. We then developthe field further by employing semantic taxonomies from within the humanities and linguistics such as the OED's historical thesaurus to identify the semantic content of these word cohorts and subdivide them into specific semantic fields to track. This method, which oscillates between semantic taxonomies and empirical word frequency correlations, ensures that the semantic fields we generate satisfy two characteristics: semantic coherence and coherence of historical behavior. We consider this dual requirement as a pragmatic move. The first requirement ensures that our results are semantically and culturally interpretable. The second requirement ensures that the aggregate term frequency results of the semantic fields are actually representative of the behaviors of their constituent words.

The Semantic Fields
In particular, this paper reports on a study of the historical behavior of four semantic fields of “abstract values.” We identified these transforming semantic fields through the methods described above. This study presents and analyzes the dramatic decline and transformation of four semantic fields [ Fig. 1] discovered under this method (each is named after what we considered the differentiating factor of that field).

Fig. 1

Field Name Example Words
Values of Social Restraint modesty, sensibility, propriety
Moral Valuation virtue, sin, conduct
Sentiment passion, sentiment, sensibility
Partiality partiality, prejudice, disinterested
Fig. 2

Full Size Image

Tracing the diachronic behavior of these fields over the nineteenth century, we found the four abstract values fields exhibited strikingly parallel downward trends [For their individual plots, see Fig. 5-8]. Collectively, the aggregate term frequency for the fields of abstract values decreases step-wise through the nineteenth century [ Fig. 2], from ~1% of all words in the period of 1800-1810, to ~0.6% of words (~1 in every 170 words) by the 1860s, a decrease of about 40%.
Given the range and scope of our corpus and the magnitude of this trend, we consider our results reflective of broad changes in the 19th century British novel. The data indicates a significant decrease in the usage of these fields. Without positing a simple reflective relationship between literary and sociocultural currents, we nevertheless take seriously Raymond Williams’s approach to social changes through changes in discourse. Thus, we consider the data to suggest the responsiveness of novelistic language to fundamental shifts in British value systems and social norms in this turbulent and transformative period, specifically shifts away from values of restraint, virtue, objectivity, and sentiment.

Fig. 3

Full Size Image

The historical behavior data of an entirely different set of words, discovered under the same method, helps to contextualize and interpret this trend. Instead of a semantic field tightly organized around a specific differentiating factor, this highly correlated word-cohort (named “Hard Seed” after its seed word) comprises a variety of semantic fields and types of words—colors, body parts, numbers, locational and directional adjectives and prepositions, action verbs, and physical adjectives. This word cohort can be collectively characterized as concrete description words. In contrast to the values fields, the aggregate term frequency of this latter group [ Fig. 3] increases steadily across the 19th century from 3.5% of all words (~1 in every 30 words) to 6.5% of all words (~1 in every 15 words)—an increase in usage of about 85%.
Fig. 4

Full Size Image

Plotting the term frequencies of the abstract values field against those of the hard seed field in each of the novels in our corpus reveals a strongly inverse relationship between the two [ Fig. 4]. Given the observed tendency for novels with higher frequencies of hard seed words to have lower frequencies of abstract values words and vice versa, we produced two rankings of the novels to see the types of narrative that correspond to the emphasis of one field over the other [see Fig. 9 for a ranking of a subset of the corpus, the Chadwyck-Healey fiction collection]. Strikingly, ranking novels by these two features indeed separates out clusters of genres into a spectrum. The resulting distribution of novels allows us to interpret these two major correlated historical trends in novelistic language as deeper shifts in narrative mode. The spectrum shows an overarching movement from narratives with small social spaces organized by highly polarized, evaluative, and uniform fields of social norms to narratives with far larger social spaces where the fields of social norms are more diverse, conflicting, ambiguous, and ultimately, less constraining. Simultaneously, there is a stylistic move away from abstract, explicitly evaluative language to concrete, physical language whose valuation, if any, is more ambiguous, variable, and indirect. That the expansion of narrative social space corresponds to this stylistic shift suggests a systemic tendency in which the representation of wider, more diverse, and less constraining social spaces is made possible by a more physical, concrete language.
This study represents initial steps in developing the quantitative analysis of the historical semantics of literary discourse in a corpus robust enough to allow the study of large-scale historical change. The methods developed herein have proven promising in identifying and analyzing robust semantic fields whose dynamics can rigorously be interpreted as reflections of literary and cultural trends. A wide field of potential inquiry remains for future studies in this vein.

Figures 5-9
Fig. 5

Full Size Image

Fig. 6

Full Size Image

Fig. 7

Full Size Image

Fig. 8

Full Size Image

Fig. 9: Distribution of genres and authors from those with the most frequent usage of Abstract Values, to the least frequent. The X-axis indicates the number of standard deviations above or below the mean. The vertical line in the center of each box indicates the median for that group.

Full Size Image

Fortier, P. A. 1989 “Some Statistics of Themes in the French Novel, ” Computers and the Humanities, 23 (4/5) 293-299

Ide, Nancy M. 1989 “A Statistical Measure of Theme and Structure, ” Computers and the Humanities, 23 (4/5) 277-283

Louwerse, Max M. 2004 “Semantic Variation in Idiolect and Sociolect: Corpus Linguistic Evidence from Literary Texts, ” Computers and the Humanities, 38 (2) 207-221

Meyer, R. N. 1910 “Bedeutungssysteme, ” KZ, 43 (4) 359

Williams, R. 1958 Culture and Society: 1780 – 1950, New York Columbia University Press

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None