Out of the Slaughterhouse: The Birth of the Modern Detective Story Corpus

paper, specified "long paper"
  1. 1. Adam Hammond

    University of Toronto, Canada

  2. 2. Simon Stern

    University of Toronto, Canada

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We introduce the Birth of the Modern Detective Story (BMDS) Corpus, a new dataset for exploring the evolution of detective fiction during a crucial period of its development. It currently comprises 380 detective stories published between 1890 and 1920, provided in full text with relevant metadata and richly annotated for 61 categories related to the types of crimes, clues, and evidence represented. The project explores in more systematic fashion some of the work begun in Franco Moretti’s “The Slaughterhouse of Literature” (2000). All data is freely accessible and open-access. We explain our motivations for developing the dataset, describe the dataset, explain our theoretical discoveries, and lay out some research trajectories.

Moretti’s “Slaughterhouse of Literature” asks a fascinating question: Why did the Sherlock Holmes stories “survive” when so many contemporaneous “competitors” were forgotten? It comes to an interesting conclusion: that their success
cannot be explained by their use of what Moretti calls “decodable” clues — which, he finds, are neither exclusive to the Holmes stories nor consistently employed in them.

However promising, Moretti’s approach is hampered by theoretical and methodological limitations as well as by its limited and opaque corpus. His categories of clues — necessary, visible, decodable — are poorly defined. The corpus of texts he investigates is small, inaccessible, and not described in detail. He performs two experiments: the first on a set of “about twenty” stories he calls “very narrow” and “haphazard” in selection; the second on 108 detective stories published in
The Strand in the 1890s, the titles of which are not provided. Given its opaque corpus and imprecise terms, it is impossible to verify the article’s claims. 

Description of the BMDS Corpus
The BMDS corpus offers several methodological advances over “Slaughterhouse,” and makes available all texts and annotations available. Whereas Moretti focuses haphazardly on the 1890s, we look at the years 1890–1920, during which the genre’s conventions are agreed to have consolidated (Humphreys 2017). We aim to include
all detective stories published in English during this period. At present, we have 380 stories. We began with those available for free in accessible data formats (now complete), moved next to those that can be purchased (underway), and will then ingest those that must be scanned and OCR’ed.

The BMDS Corpus is being assembled as follows. Starting in May 2021, we employed eleven separate student annotators. Working in pairs, students read a story and fill out a form asking them 61 questions, with terms set out in the Annotation Guidelines (Hammond et al. 2021). These include questions about the number, gender, and role of the story’s detectives, assistants, victims, and culprits; the types and motivations of crimes investigated in the story; the types of clues and evidence present in the story; whether the crime is solved; and how subjectively satisfying the story is. This data is recorded in tabular form. The Corpus also includes Dublin Core metadata for all stories (380) and authors (20) in the corpus. Further, it includes every story in plain text form.

Screen shot from tabular data recorded for individual stories.

Theoretical Contributions 
The BMDS Corpus makes several important theoretical advances over “Slaughterhouse”; most notably, it provides several new categories of clues, and it introduces the distinction between the “investigation” and “reveal” phases of detective stories. 
Our Guidelines clearly distinguish between four types of clue: evoked, illegible, legible, and usable. For instance, a legible clue is defined as one that is “presented in sufficient detail to appear to the reader as a clue” whereas a usable clue is “a legible clue that leads an alert reader in the direction of the correct solution to the crime” (this distinction is not present is Moretti). 
Our understanding of clues depends not merely on whether clues are
present but how they are
used in the story’s plot. We divide detective fiction plots into two parts: the “investigation phase,” during which the crime is actively investigated; and the “reveal phase,” during which the detective presents their solution. Our clue categories depend on how particular clues
appear to the reader in each of these two phases; for example, we distinguish between a clue that is “legible but not usable” (one whose legibility does not point an alert reader in the direction of the correct solution) versus one that is “usable in real time” (one that does). (See Figs 2 and 3).

Screen shot from the input form annotators use to enter findings. This is one of 61 questions, which asks the annotator to input the types of clues employed in the given story.

Screen shot from Annotation Guidelines corresponding to the choice in Fig. 2.

Research Applications
Our fuller specification of the types and functions of clues makes it possible to refine and revise Moretti’s claims about the counterintuitive result he discerned, concerning the haphazard and inconsistent use of “decodable” clues. Changes over time that reflect our distinction between “legible” and “usable” clues suggest that mystery writers
did in fact come to rely more heavily on clues during this period — but not necessarily clues planted for readers to use. Detectives were increasingly required to solve crimes by reasoning on the basis of trace evidence, and this pattern shows a marked change during the period surveyed here. 

Another possible application involves the gender dynamics that animate these stories. Although both authors and detectives were usually male, a considerable number of women wrote in this genre, sometimes with female detectives. Exploring the gender of the authors, detectives, victims, suspects, and culprits may allow for insights into styles of detection, types of clues, varying emphasis on different types of crimes, and the like.
The “investigation” / “reveal” distinction may facilitate various kinds of discoveries relating to some of the issues above and other discoveries involving the stories’ structure and effects. Does the ratio of the investigation phase to the reveal phase vary with types of clues, or types of crimes? Over time, do we see that ratio stabilize, as writers come to perceive preferences among readers? Does the language of the reveal phase exhibit distinctive features that appear to enhance reader satisfaction? 


Hammond, A., Stern, S., Colclough, Z., Côté, C., Maharaj, A., Maleshev, M, Michielin, J., Oh, S., Kim, S., Selvaraj, M., Wen, W. (2021). BMDS Annotation Guidelines.
https://tinyurl.com/bmdsguidelines (accessed April 28, 2020).

Humphreys, A. (2017). British Detective Fiction in the 19th and Early 20th Centuries.
Oxford Research Encyclopedia: Literature.

Moretti, F. (2000). The Slaughterhouse of Literature.
Modern Literature Quarterly 61(1), pp. 207–227. 2000.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO