Burrows’s “Delta” is a popular authorship attribution algorithm (Burrows 2002). Suppose that we have an anonymous text which has to be attributed to one of a series of candidate authors for whom we have a number of reference samples as training material. Delta computes a dissimilarity score between the test item and all reference samples and attributes the anonymous text to the author of the sample to which it is most similar. We propose a procedure called “Rolling Delta”, reminiscent of a number of earlier applications (e.g. Van Dalen-Oskam & Van Zundert 2007; Burrows 2010; Van Zundert & Van Dalen-Oskam 2012; Hoover 2012). The general goal is to visualize stylistic shifts in texts, for instance, in order to pinpoint authorial takeovers in the case of collaborative authorship. An implementation of Rolling Delta is freely available (Eder, Kestemont & Rybicki 2012).
First, each reference text is segmented into equal-sized, partially overlapping samples. If we specify a ‘window size’ of 5,000 and a ‘step size’ of 100, for example, the first sample of a text contains words 1-5,000, the second 101-5,101, etc. The procedure uses the relative frequencies of the n most frequent words in the reference collection. Subsequently, we compute a centroid (C) for each reference text, containing the mean relative frequency for each word in its windows, and the standard deviation. Then we divide the test text into windows and compute the ‘Delta’ between each test window W and each reference centroid, using the following formula— see Argamon (2008) for more details:
After “rolling” through the test text, we plot the resulting Delta series for each reference text. The lower the Deltas for a reference text, the more similar the style in the test windows — and vice versa. If the curve for a text shows a sudden drop, this may indicate a stylistic change in the test text, caused, for instance, by one author taking over from another. One can use vertical lines in the plot to mark the position of certain events in the test text as an aid in interpretation (e.g. chapter beginnings).
This method seems the perfect tool to study the notable literary collaboration between Joseph Conrad (1857-1924) and Ford Madox Ford (1873-1939): their three joint works, The Inheritors (1901), Romance (1903) and The Nature of a Crime (1909, 1924), and the various authorship claims by Ford, including that concerning a fragment of Nostromo (1904). The Conrad/Ford controversy was enhanced by the two authors’ neurotic behavior that eventually took its toll on their collaboration and friendship. Physical evidence about authorship is complicated by the fact that Ford (and others, including John Galsworthy) took Conrad’s dictation when he was sick or indisposed or could not make a deadline.
The Inheritors is Ford’s second published novel, coming out nine years after The Shifting of the Fire. He did most of the writing himself, though he discussed it extensively with Conrad, whose role, he said, was “to give each scene a final tap” (Saunders 1996: 135-36). For Romance, based on Ford’s earlier unfinished Seraphina, however, the consensus seems to be that it is about two thirds Conrad and one third Ford. According to Conrad:
We collaborated right through, but it may be said that the middle part of the book is mainly mine with bits by F.M.H. — while the first part is wholly out of “Seraphina”: the second part is almost wholly so. The last part is certainly three quarters MS. F.M.H. with here and there a par. by me.
According to Ford, “parts one, two, three and five are a mosaic of alternately written passages while part four is entirely Conrad’s work” (Karl 1997: 147). Najder further comments that “the change in numbering the parts of Seraphina has caused some trouble for Conrad’s and Ford’s biographers. As late as the summer of 1901, the novel consisted of four parts, but ended, as it does now, with Kemp’s trial. While continuing to write part 3, Conrad expanded it into another, which became part 4, and the last, part 5, written by Ford” (Najder 2007: 317). The third collaborative work, The Nature of a Crime, was written almost exclusively by Ford and heavily edited by Conrad.
Ford’s possible contribution to Nostromo — mostly based on the one large part of the manuscript in his hand — is limited to the novel’s second part. Brice quotes a letter from Ford to Keating (1923 or 1925) saying he wrote 10,000 words of Nostromo that he remembers and that he “could place my finger on fairly substantial passages” (Brice 2004: 79), and another 20,000 that he only faintly remembers and would find difficult to trace. Later, in Return to Yesterday, Ford himself minimizes his contribution, saying that what he “wrote into Conrad’s books was by no means great in bulk” (Brice 2004: 78) and was “so frequently emended out of sight that they could not make as much difference to the completion and glory of his prose as three drops of water poured into a butt of Malmsey” (Brice 2004: 79). This study tries to find the drops of Ford’s water in Conrad’s Malmsey.
Figure 1 shows a bootstrap consensus tree of works and collaborations by Conrad and Ford, produced from multiple cluster analyses of the most frequent word frequencies for this collection of texts (Eder & Rybicki 2011). It places Romance decisively among works by Conrad, but the two other collaborative texts among those by Ford.
Figure 2 is a ‘Rolling Delta’ diagram produced for The Inheritors, which is compared to four novels by Conrad (The Nigger of the Narcissus 1897, Lord Jim 1900, Chance 1913, and Victory 1915) and four by Ford (The Fifth Queen 1906, Privy Seal 1907, The Fifth Queen Crowned 1908, and The Good Soldier 1915). The analysis was performed for the 1,000 most frequent words appearing in all the texts with a ‘rolling window’ of 5000 words, stepping by 1,000. Throughout the collaborative novel, Ford’s style (part that of The Benefactor, part that of Privy Seal) dominates over that of Conrad’s — except for a short fragment that coincides very closely with Chapters 16 and 17 of The Inheritors (contained within vertical lines a and b); here, the greatest similarity is to Chance and Heart of Darkness. Interestingly, however, Conrad’s earliest work among the four used for comparison, The Nigger of the Narcissus, is the outlier almost invariably in The Inheritors.
Figure 3 presents the same comparison for Romance. This novel, in good agreement with Figure 1, exhibits a domination of Conradian style (mostly that of Lord Jim, Heart of Darkness and Chance). Ford’s idiom (this time, in its The Benefactor variety) makes itself seen in a single long fragment: Part 1, Chapter 5 (between lines a and b) and in a much shorter one (Part 2, Chapter 7: c).
Figure 4 shows the same results for The Nature of a Crime, which strongly diverge from the previous collaboration. Ford’s style (mostly that of The Good Soldier) dominates the final joint effort of the two writers, with two minor and non-decisive interventions of Conrad’s style (esp. Under Western Eyes).
Figure 5 tests the hypothesis that Ford did indeed contribute to Conrad’s Nostromo, but provides little to support it: Conrad’s style dominates Ford’s throughout.
The application above of Rolling Delta produces interesting results. Chief among them is a confirmation of the usual (if uncertain) consensus about the proportions of the styles of both writers in their three collaborations. The decisive domination of Ford’s style over Conrad’s in The Inheritors and The Nature of A Crime is interesting, as it seems to have survived Conrad’s extensive editing, confirmed by biographical evidence. A similar stylometric visibility of the underlying authorial personality that persists despite subsequent editing has been reported in a study of an edited translation (Heydel & Rybicki 2012). From a methodological point of view, “Rolling Delta” for R (devised by Kestemont) is a welcome addition to the latest stylometric tools, with its potential to pinpoint the change(s) from author to author in collaborative works. Further experiment are required to explain why, in some cases, the Rolling Delta curves tend to rise and fall in unison. It is possible that these fluctuations are simply indicative of minor stylistic fluctuations in the reference text, or, when these differences are larger, indeed its “originality” or rather its “deviation” with respect to the rest of the texts.
Argamon, S. (2008). Interpreting Burrows’s Delta: Geometric and probabilistic foundations. Literary and Linguistic Computing. 23. 131-147.
Brice, X. (2004). Ford Madox Ford and the Composition of Nostromo. The Conradian: the Journal of the Joseph Conrad Society (Autumn) 75-95.
Burrows, J. (2002). 'Delta': A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing. 17. 267-287.
Burrows, J. (2010). Never Say Always Again: Reflections on the Numbers Game. In McCarty, W. (ed), Text and Genre in Reconstruction. Effects of Digitalization on Ideas, Behaviours, Products and Institutions. Cambridge: Open Book Publishers. 13-36.
Eder, M. and J. Rybicki (2011). Stylometry with R. In Digital Humanities 2011: Conference abstracts, Stanford University. 308-311.
Eder, M., M. Kestemont, and J. Rybicki (2012). Computational Stylistics. https://sites.google.com/site/computationalstylistics/
Heydel, M., and J. Rybicki (2012). The Stylometry of Collaborative Translation. Woolf’s Night and Day in Polish. Digital Humanities 2012 Conference Abstracts 212-217.
Hoover, D. (2004). Testing Burrows’s Delta. Literary and Linguistic Computing 19. 453-475.
Hoover, D. (2012). The Tutor's Story: A Case Study of Mixed Authorship. English Studies 93. 324-339.
Karl, F. (1997). A Reader's Guide to Joseph Conrad, Syracuse, NY: Syracuse University Press.
Najder, Z. (2007) Joseph Conrad: A Life. trans. by Najder, H. New York: Camden House.
Saunders, M. (1996). Ford Madox Ford: A Dual Life. 1. New York: Oxford University Press.
Van Dalen-Oskam, K., and J. Van Zundert (2007). Delta for Middle Dutch — Author and copyist distinction in Walewein. Literary and Linguistic Computing. 22. 345-362.
Van Dalen-Oskam, K., and J. Van Zundert (2012). Delta in 3D: Copyist distinction by Scaling Burrows’s Delta. Digital Humanities 2012 Conference Abstracts. 402-404.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at University of Nebraska–Lincoln
Lincoln, Nebraska, United States
July 16, 2013 - July 19, 2013
243 works by 575 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)
Conference website: http://dh2013.unl.edu/
Series: ADHO (8)