The Tutor's Story: A Case Study of Mixed Authorship

paper
Authorship
  1. 1. David L. Hoover

    New York University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Tutor's Story: A Case Study of Mixed Authorship
Hoover, David L., New York University, david.hoover@nyu.edu
The Victorian novelist and Christian Socialist Charles Kingsley (1819-1875) is now known mainly for his children’s book, Waterbabies (Kingsley 1863), though he also wrote political and historical novels. Long after his death, his daughter, Mary St. Leger Kingsley Harrison (1852- 1931), discovered an unfinished and unexpected novel manuscript entitled “The Tutor’s Story” among his papers. Mrs. St. Leger Harrison, writing under the name Lucas Malet, was herself one of the most famous novelists at the turn of the twentieth century, one who explored daring themes like incestuous desire, lesbianism, sadism, and prostitution (Schaffer 1996:109). Malet finished her father’s novel and published it in 1916 (Kingsley and Malet 1916).

In her preface, Malet describes the state, size, and nature of the manuscript, and this description gives a fairly solid basis for assigning at least parts of the novel to the two authors. She tells us that the beginning of the manuscript, and so presumably also the novel, was “fairly consecutive,” so that we can expect the early chapters to be Kingsley's. But she also tells us that there were other “chapters and skeletons of chapters” from much later in the story, without further indicating where these occur in the novel. Finally, she reports that the plot was unresolved, and that she doubled the size of the text in completing it (Kingsley and Malet 1916: vi). This suggests that the late chapters of the novel are probably mostly by Malet. This complex and difficult scenario provides a good opportunity for testing the effectiveness and limitations of some old and some new methods of authorship attribution, including t-tests, Burrows's Delta (Burrows 2002, 2003; Hoover 2004a, 2004b), and Craig's version of Burrows's Zeta (Craig and Kinney 2009; Hoover 2010).

As is so often true in the real world, some aspects of this authorship problem are not exactly what one would want. Some of Kingsley’s novels are as much social commentary as fiction, dealing with issues like the plight of the rural poor, poor sanitation, child labor, and the exploitation of workers. Others are historical novels, set in Anglo-Saxon times, during the reign of Elizabeth I, and fifth-century Alexandria. Two others are children’s books. Given this varied output, it is difficult to assemble sufficient similar Kingsley texts for testing. Furthermore, Malet tells us that she has tried to match her style to that of her father, and contemporary reviews of the novel comment that the book sounds just like Kingsley (Book Review Digest 1917). Finally, while most of Kingsley’s fiction is third-person, this is a first-person novel. Malet’s fiction is less varied, but it is also mostly third-person.

In spite of these difficulties, initial PCA, Cluster Analysis, and Delta tests on a group of novels by Kingsley and Malet all very successfully distinguish the two authors. Delta results remain quite accurate even for short sections, typically about 90% accurate for Kingsley, and often 100% accurate for Malet, even on large numbers of 500-word sections. Because we can expect some relatively short passages by each writer interspersed with passages by the other, it seems reasonable to test the entire novel divided into sections of 524 words (the novel divides almost exactly into sections of this size). In order to identify changes of authorship in such brief passages, the novel is tested with rolling segments of 524 words. The first section comprises the first 524 words; the next section comprises the 524 words that begin at word number 132, the next the 524 words that begin at word number 263, the next the 524 words that begin at word number 394, and so on through the rest of the novel. Rolling segments have been put to good use in several authorship attribution and stylistics studies; see Craig (1999), Burrows (2010), and van Dalen-Oskam and van Zundert (2007).

I am testing the rolling sections of the novel in three ways. The first uses a list of 2873 marker words that t-tests identify as being used significantly differently by the two authors (p < .05). The percentage of the word types (really individual spellings) in each section that belong to each author’s set of marker words is graphed in Fig. 1. The upper set of lines show the percentages of Kingsley marker words and the lower set the percentages of Malet marker words. For example, in the first sections of Chapter 1, about 33% of the types are Kingsley marker words and about 18% are Malet marker words. The separation of the two sets is nicely distinct for the first three chapters, all of which, as expected, are attributed to Kingsley. The beginning of Chapter 4 seems to contain some of Malet’s writing, and about the first two-thirds of Chapter 6 is attributed to Malet.

Full Size Image

The t-test results for chapters 4-6 are repeated in a slightly different form in Fig. 2 (upper two lines; 20% has been added to the percentages for the t-test marker words to create a separation between the two sets of lines), along with results from Craig Zeta tests on the same sections (lower two lines). Rather than showing a separate line for each starting point, as in Fig.1, all the testing points for each set of marker words in Fig. 2 are joined by a single line. The graph for Craig Zeta shows the percentage of types in each section that are among the 500 most distinctively used Kingsley and Malet marker words. The smaller percentages for Zeta than for the t-tests reflect the fact that only 1000 marker words are used here, compared to the 2873 ttested marker words. Nevertheless, it is easy to see that Craig Zeta and t-tests give similar results and agree generally on the attribution of various parts of the chapters. Delta tests on similar-sized sections usually agree with these results as well. Many

Full Size Image

Many of the chapters of the novel seem to be largely by one or the other author, but others seem thoroughly mixed. These results fall in line with what Malet’s preface leads us to expect, and overall they seem fairly persuasive. A recent discovery makes them both more compelling and somewhat frustrating. After I had completed the testing described above, the problem seemed fascinating enough to deserve further research, and I began by trying to find out whether Kinglsey’s manuscript might still exist. Although I was not able to find any record of the manuscript, I came across a record of a copy of the novel in the Princeton Rare Books collection with Malet’s penciled notes about which parts of the novel were written by Kingsley and which she wrote herself. For some chapters, her notes are quite precise, and they indicate that the attributions in Fig. 1 and Fig. 2 are essentially correct. For other chapters, she notes only that they are “mostly my father.” Most frustrating of all is the fact that all markings cease after chapter 28 (of 41). The fact that the tests described above disagree with her notes for only 5-7 chapters suggest that, even texts involving mixed, joint, or collaborative authorship can be usefully investigated using these methods.

References:
“Kingsley, Charles. Tutor's Story., ” Book Review Digest, 1917 Volume 12 H.W. Wilson Company White Plains, NY Online: Google Books

Burrows, J. 2002 “‘Delta’: a measure of stylistic difference and a guide to likely authorship., ” LLC, 17 267-287

Burrows, J. 2003 “Questions of authorship: attribution and beyond., ” CHUM, 37 5-32

Burrows, J. 2010 “Never say always again: reflections on the numbers game, ” Text and Genre in Reconstruction. Effects of Digitalization on Ideas, Behaviours, Products and Institutions, W. McCarty Open Books Cambridge

Craig, H., and Kinney, A. (eds.) 2009 Shakespeare, Computers, and the Mystery of Authorship, Cambridge University Press Cambridge

Craig, H. 1999 “Jonsonian chronology and the styles of A Tale of a Tub., ” Re-Presenting Ben Jonson: Text, History, Performance, M. Butler Macmillan Houndmills 210–32

Hoover, D. 2010 “Authorial style, ” Language and Style: Essays in Honour of Mick Short, D. McIntyre and B.Busse Palgrave

Hoover, D. 2004 “Delta prime?, ” LLC, 19(4) 477-495

Hoover, D. 2004 “Testing Burrows’s Delta., ” LLC, 19(4) 453-475

Kingsley, C. 1863 The Waterbabies: a Fairytale for a Land-baby, Macmillan London

Kingsley, C. and Malet, L. 1916 The Tutor’s Story, Smith Elder London

Schaffer, T. 1996 “Some chapter of some other story: Henry James, Lucas Malet, and the real past of The Sense of the Past., ” The Henry James Review, 17.2 109-128

van Dalen-Oskam, K, J. van Zundert 2007 “Delta for Middle Dutch—author and copyist distinction in Walewein., ” LLC, Vol. 22, No. 3

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None