Stanford University
Does the 19th-century American novel narrate, or does it
describe? In 1936, Georg Lukács argued that over the course
of the nineteenth century description replaced narration as the
dominant mode of the European (and especially French) novel.
What happens, we want to know, for the American novel?
We begin with a structuralist distinction between the logics
of narration and description: narration as the articulation of
the events that make up a story; description as the attribution
of features to characters, places, and objects in the story. We
then set out, with the help of computational linguistics, to
identify the computable sentence-level stylistic “signals” of
narration and description from a small training sample (roughly
Digital Humanities 2008 _____________________________________________________________________________
_____________________________________________________________________________
15
2,000 sentences), and thus to develop a model that can classify
narration and description in “the wild”—a corpus of over 800
American novels from 1789 to 1875. This paper will discuss
how we have refi ned our research problem and developed
our classifying model—trial-and-error processes both; our
initial results in “the wild”; and fi nally how macro-analysis of
this kind leads to new problems for literary history.
Existing scholarship suggests that “realist” description enters
the American novel with the historical novel; thus our initial
training set of samples was taken from 10 American historical
novels from the 1820s, 1830s, and 1840s (by J.F. Cooper and
his rivals). Participants in the Beyond Search workshop have
tagged random selections from these 10 novels. The unit of
selection is, for convenience, the chapter. Selected chapters
are broken (tokenized) into individual sentences and humantagged using a custom XML schema that allows for a “type”
attribute for each sentence element. Possible values for the
type attribute include “Description,” “Narration,” “Both,”
“Speech,” and “Other.” Any disagreement about tagging the
training set has been resolved via consensus. (Since the signals
for description may change over time—indeed, no small
problem for this study—we plan to add an additional training
sample from later in the corpus.)
Using a maximum-entropy classifi er we have begun to
investigate the qualities of the evolving training set and to
identify the stylistic “signals” that are unique to, or most
prevalent in, narrative and descriptive sentences. In the case
of description, for example, we fi nd a marked presence of
spatial prepositions, an above average percentage of nouns
and adjectives, a relatively low percentage of fi rst and second
person pronouns, above average sentence lengths, and a high
percentage of diverse words (greater lexical richness). From
this initial work it has become clear, however, that our fi nal
model will need to include not simply word usage data, but also
grammatical and lexical information, as well as contextualizing
information (i.e., the kinds of sentence that precede and follow
a given sentence, the sentence’s location in a paragraph). We
are in the process of developing a model that makes use of
part of speech sequences and syntactic tree structures, as well
as contextualizing information.
After a suitable training set has been completed and an
accurate classifying model has been constructed, our intention
is to “auto-tag” the entire corpus at the level of sentence.
Once the entire corpus has been tagged, a straightforward
quantitative analysis of the relative frequency of sentence
types within the corpus will follow. Here, the emphasis will be
placed on a time-based evaluation of description as a feature of
19th-century American fi ction. But then, if a pattern emerges,
we will have to explain it—and strictly quantitative analysis
will need to be supplemented by qualitative analysis, as we
interrogate not just what mode is prevalent when, but what
the modes might mean at any given time and how the modes
themselves undergo mutation.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at University of Oulu
Oulu, Finland
June 25, 2008 - June 29, 2008
135 works by 231 authors indexed
Conference website: http://www.ekl.oulu.fi/dh2008/
Series: ADHO (3)
Organizers: ADHO