The Stylometry of Collaborative Translation

  1. 1. Magda Heydel

    Jagiellonian University

  2. 2. Jan Rybicki

    Jagiellonian University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Stylometry of Collaborative Translation
Print Friendly
Heydel, Magda, Jagiellonian University, Krakow, Poland,
Rybicki, Jan, Jagiellonian University, Krakow, Poland,
the problem
A translated work of literature is a collaborative effort even if performed by a single translator, always haunted by the ghost of the author of the original. The relationship between the two has been at the centre of mainstream translation studies and in the discipline’s corpus-based and stylometric varieties, as evidenced by a growing body of scholarship (Olohan 2005; Oakes & Ji 2012). Stylometric problems multiply when the term ‘collaborative translation’ is taken to signify a joint rendering of a single author into a different language by two (or more) translators, or by translator and editor (Rybicki 2011). In general, stylometry based on multivariate analyses of word frequencies successfully detects the author of the original – rather than the translator – in translations (Rybicki 2010); sometimes, this success varies from translator to translator (Burrows 2002a). It is only when translations of the same author are compared is there any hope for stylometric machine-learning methods to tell translator from translator (Rybicki 2012).

This is exactly why this study focuses on a problem situated somewhat in the middle of the above, as collaborations between translators on a single literary work are a fact in the publishing industry. In the Polish market, this is perhaps most famously evidenced by Maria and Cezary Frąc, responsible for the third Polish translation of Tolkien’s The Lord of the Rings. It is notoriously difficult to obtain information on the reasons and details of such translatorial collaborations from either the translators or their publishers; usually, looming deadlines for lengthy popular novels are blamed (Kozieł 2011).

But a collaborative translation can also be made for other reasons. When one of Poland’s most eminent translators, Anna Kołyszko died of cancer during her work on Virginia Woolf’s Night and Day, leaving a finished draft of much of the book and some notes on additional sections, the translation was taken over by Magda Heydel, a particular specialist in Woolf (Jacob’s Room, Between the Acts, A Moment’s Liberty, On Being Ill), who also performed some editing on the entire text. As Heydel stated in a TV interview, it was for the readers to see whether there was or there was not a rift in the middle of the book where one translator took over from the other; she hoped her editing made the narration coherent as far as the style was concerned. She also emphasized the uniqueness of the translator’s experience to confront her own intuition of her voice in the text with that of another (Heydel 2011). Thanks to her previous work and research on Woolf, she had had quite a definite idea of what stylistic shape Night and Day should obtain in its Polish translation. Her linguistic image of Woolf’s style, being, as it to an extent must, rooted in her own idiosyncratic ‘feeling’ of the language, was also informed by tangible evidence in Woolf scholarship. The technique of the changing point of view, to be elaborated in Woolf’s mature work into the stream of consciousness, a very important achievement of the writer, is clearly visible already in this early novel. Recreating this aspect of her writing has always been one of Heydel’s central concerns. Also, her particular translation technique is to a large extent based on the idea of the voice of the speaker, with the actual reading aloud for the effect of naturalness as the ultimate test for a successful rendition. In Woolf the recognizable ‘voice’ of the focalizer is central as it produces the point of view in narration (Rait 2010). Thus the changes Heydel introduced into Anna Kołyszko’s text were not (or very rarely) lexical but mainly syntactical. She worked with the famously long and intricate Woolfian sentences, the more so that the Polish language, with its extremely flexible sentence structure, locates most of its rhetorical and pragmatic devices here. Also for this reason, most-frequent-word analysis was a well-suited approach to this experiment in translatorial attribution.

Indeed, this seems a translatorial counterpart of David Hoover’s study on The Tutor’s Story, a novel begun by Charles Kingsley and completed by his daughter Mary under her pen name Louis Malet, with some information available on who wrote what (Hoover 2011). In the Polish Night and Day case, this information is exact; in both cases, the early chapters have been written by the first, the final ones by the second translator. It is also reminiscent of an earlier study on the Middle Dutch epic Walewein (van Dalen-Oskam & van Zundert 2007). The main difference consists in the fact that Heydel is available to confirm or deny the findings of the quantitative analysis.

the method and the corpus
This study applies Cluster Analysis to Delta-normalized word frequencies in texts; as shown by (to name but a few) Burrows (2002) and Hoover (2004, 2004a), and despite limitations discussed by Smith & Aldridge (2011), this is one of the most precise methods of ‘stylistic dactyloscopy.’ A script by Maciej Eder, written for the R statistical environment, converts the electronic texts to produce complete most-frequent-word (MFW) frequency lists, calculates their z-scores in each text according to the Burrows Delta procedure (Burrows 2002); selects words for analysis from various frequency ranges; performs additional procedures for better accuracy (Hoover’s culling and pronoun deletion); compares the results for individual texts; produces Cluster Analysis tree diagrams that show the distances between the texts; and, finally, combines the tree diagrams made for various parameters (number of words, degree of culling) in a bootstrap consensus tree (Dunn et al. 2005, quoted in Baayen 2008: 143-47). The script was demonstrated at Digital Humanities 2011 (Eder & Rybicki 2011) and its ever-evolving versions are available online (Eder & Rybicki 2011a). The consensus tree approach, based as it is on numerous iterations of attribution tests at varying parameters, has already shown itself as a viable alternative to single-iteration analyses (Eder & Rybicki 2011b; Rybicki 2011).

The Woolf translation was analysed by comparing equal-sized fragments (at various iterations of fragment size) of the translation of Night and Day to determine the chapter where Heydel had taken over from Kołyszko; Heydel was consulted only after the initial determination had been made. At this point, the Kołyszko and the Heydel portions of the book were compared to other translations by Heydel (Woolf’s Jacob’s Room, A Moment’s Liberty and Between the Acts, Graham Swift’s The Light of Day and Conrad’s Heart of Darkness) and by Kołyszko (McCarthy’s Child of God, Miller’s Tropic of Capricorn, Roth’s Portnoy’s Complaint, Rushdie’s Midnight’s Children), and then to an even more extended corpus of author-related translations.

All iterations of different fragment sizes of the Kołyszko/Heydel translation pointed to the beginning of Chapter 27 as the place where Heydel took over the translation from Kołyszko. Figure 1 shows the attribution of medium-sized fragments (approximating, in this case, mean chapter length) of the translation. In fact, Kołyszko has completed 25 full chapters and left scattered notes on Chapter 26; these have been collected, organized and edited by Heydel. If we are to believe stylometric evidence, the latter made a very good job of preserving the style of the former, so that her own style is only visible in earnest in Chapter 27.

Figure 1
Figure 1: Consensus tree for equal-sized fragments of Kołyszko and Heydel’s translation of Woolf’s Night and Day, performed for 0-1000 MFWs with culling at 0-100%. The beginning of fragment 27 of the translation roughly coincided with the beginning of Chapter 27

Once the authorship of the translation of Night and Day became confirmed, it was possible to place this shared work in the context of other translations by Heydel and Kołyszko; unfortunately, they never translated different books by the same author. The consensus tree in Figure 2 is divided neatly between Kołyszko and Heydel. In this context, the overall editing by Heydel might be visible in that both parts of Night and Day remain in her section of the tree; this might equally be a result of the original authorship, as the Woolf novels are close neighbours here.

When more translations by the same authors but by other translators are added to the corpus, the balance between translatorial and authorial attribution is clearly shifted towards the latter. In Figure 3, five translations of Woolf (including Night and Day) by four different translators occupy the lower branches of the consensus tree, with Bieroń’s Orlando being the only (relative) outsider. Kołyszko’s and Heydel’s translations in the upper half of the graph cluster with translations of respective authors by other translators.

Figure 2
Figure 2: Consensus tree for translations by Kołyszko and Heydel, performed for 0-1000 MFWs with culling at 0-100%

So far, attempts at finding stylistic traces of the translator or the editor have been only partially successful. Word-frequency-based stylometric methods have shown that they are better at attributing the author of the original than the translator (Rybicki 2009, 2010; Eder 2010) – as has already been stated, unless translations of a single author are compared. In the latter case, Burrowsian stylometry is quite capable of telling translator from translator.

The translatorial attribution is greatly helped by the adoption of the bootstrap consensus tree approach, which minimizes attributive errors due to unlucky combinations of parameters as, simply speaking, Delta and similar distance measures are more often right than wrong, but the proportion between right and wrong might vary for a variety of reasons – especially language (Eder & Rybicki 2011b). This is particularly significant in a rare translatorial attribution case such as the Kołyszko/Heydel translation of Night and Day, where the results of stylometric analysis can be confirmed or denied by the translator herself. Equally importantly, this study demonstrates that although stylometry seems to find traces of the author as well as of the translator, these traces can be disambiguated by placing the disputed translations in contexts of various corpora.

Baayen, R. H. (2008). Analyzing Linguistic Data. A Practical Introduction to Statistics using R. Cambridge: Cambridge UP.

Burrows, J. F. (2002). Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship. Literary and Linguistic Computing 17: 267-87.

Burrows, J. F. (2002a). The Englishing of Juvenal: Computational Stylistics and Translated Texts. Style 36: 677-99.

Dalen-Oskam, K. van, Zundert, J. van (2007). Delta for Middle Dutch – Author and Copyist Distinction in Walewein. Literary and Linguistic Computing 22: 345-62.

Dunn, M., A. Terrill, G. Reesink, R. A. Foley, and S. C. Levinson (2005). Structural Phylogenetics and the Reconstruction of Ancient Language History. Science 309: 2072-075.

Eder, M., and J. Rybicki (2011). Stylometry with R. Poster. Stanford: Digital Humanities 2011.

Eder, M., and J. Rybicki (2011a). Computational Stylistics

Eder, M., and J. Rybicki (2011b). Do Birds of a Feather Really Flock Together, or How to Choose Test Samples for Authorship Attribution. Stanford: Digital Humanities 2011.

Heydel, M. (2011). Interview in ‘Czytelnia.’ TVP Kultura, 12 Feb.

Hoover, D. L. (2004). Testing Burrows’s Delta. Literary and Linguistic Computing 19: 453-75.

Hoover, D. L. (2004a). Delta Prime? Literary and Linguistic Computing 19: 477-95.

Kozieł, M. (2011). The Translator’s Stylistic Traces. A Quantitative Analysis of Polish Translations of The Lord of the Rings. MA thesis, Uniwersytet Pedagogiczny, Kraków.

Oakes, M., and M. Ji (2012). Quantitative Methods in Corpus-Based Translation Studies, Amsterdam: Benjamins.

Olohan, M. (2004). Introducing corpora in translation studies. London: Routledge.

Rait, S. (2010). Virginia Woolf’s early novels: Finding a voice. In Seller, S. (ed.), The Cambridge Companion to Virginia Woolf, Cambridge: Cambridge UP.

Rybicki, J. (2010). Translation and Delta Revisited: When We Read Translations, Is It the Author or the Translator that We Really Read? London: Digital Humanities 2010.

Rybicki, J. (2011). Alma Cardell Curtin and Jeremiah Curtin: The Translator’s Wife’s Stylistic Fingerprint. Stanford: Digital Humanities 2011.

Rybicki, J. (2012). The Great Mystery of the (Almost) Invisible Translator: Stylometry in Translation. In M. Oakes and M. Ji (eds), Quantitative Methods in Corpus-Based Translation Studies, Amsterdam: Benjamins.

Smith, P., and W. Aldridge (2011). Improving Authorship Attribution: Optimizing Burrows’s Delta Method. Journal of Quantitative Linguistics 18(1): 63-88.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2012
"Digital Diversity: Cultures, languages and methods"

Hosted at Universität Hamburg (University of Hamburg)

Hamburg, Germany

July 16, 2012 - July 22, 2012

196 works by 477 authors indexed

Conference website:

Series: ADHO (7)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None