The Invisible Translator Revisited

paper, specified "long paper"
Authorship
  1. 1. David L. Hoover

    New York University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


A translator normally replaces almost all the original author's vocabulary except proper nouns. Most authorship attribution methods are based on the frequencies of the most frequent words or n-grams, the latter themselves derived from the sequence of words. Given these facts, one might expect attributions of translations to identify them by translator rather than author. Yet that is not the case. Rather, despite the replacement of the original author's language by that of the translator, translations are normally attributable to their original authors, rendering the translators virtually invisible. Jan Rybicki, himself an accomplished translator, has presented some important discussions of this peculiar state of affairs (Rybicki 2006, 2012), but a further investigation of this curious phenomenon seems worthwhile (see also Burrows 2002, Rybicki and Hedel 2013).
As a first step, consider a test of twenty texts by Chekhov translated by five translators (or pairs of translators). Figure 1 shows a Stylo bootstrap consensus tree (Eder, Rybicki, and Kestemont 2016), based on cluster analyses of the 200-2,000 most frequent words (pronouns deleted) in increments of 100 words and with culling from 0% to 100% in increments of 20% (0% retains all words; 100% retains only words occurring in all texts), consensus .5 (at least 50% agreement is required to group texts).

Fig. 1 Chekhov Translations by Multiple Translators
Here multiple translations of the same text rather than multiple translations by the same translator cluster consistently, suggesting that text identity is a stronger signal than translator (on the strength of various signals, see Jockers 2013: 79-81). Note, however, that three of the four Garnett translations of texts not translated by any of the other translators group together.
Next, consider the bootstrap consensus tree of multiple translations of five Russian authors by Constance Garnett seen in Fig. 1 (same stipulations as above), which does an excellent job of grouping authors even without the effect of multiple translations of the same text.

Fig. 2 Garnett Translations of Multiple Authors
The strength of the original author's signal in translations can be tested more thoroughly using Stylo's Classify function. For the first test, 30 texts form the training set: 5 Chekhov texts by 4 translators, 9 Dostoevsky texts by 7 translators, 5 Gogol texts by 4 translators, 7 Tolstoy texts by 3 translators, and 4 Turgenev texts by 2 translators. The test set contains 47 texts by the same authors: 10 Chekhov texts by 4 translators, 13 Dostoevsky texts by 6 translators, 8 Gogol texts by 3 translators, 9 Tolstoy texts by 4 translators, and 7 Turgenev texts by 4 translators. No translations of the same text appear in both groups, eliminating the signals of individual texts. Thus the task is to attribute a set of test texts (sometimes including multiple translations of a single text by different translators) to the original authors of a different set of training texts. Based on the 100-2,000mfw (with an increment of 100), with 40% culling and pronouns deleted, NSC (nearest shrunken centroid) classification is 94.5% accurate (888 of 940 correct attributions) and SVM (support vector machine) classification 96% (902 of 940 correct attributions). These results would be strong even on texts that had not been translated.
A second much stricter test involves 34 training texts: 7 Chekhov texts by 2 translators, 7 Dostoevsky texts by 2 translators, 10 Gogol texts by 2 translators, 6 Tolstoy texts by 2 translators, and 4 Turgenev texts by 1 translator. The test set contains 44 texts by the same authors: 5 Chekhov texts by 4 translators, 14 Dostoevsky texts by 6 translators, 10 Gogol texts by 4 translators, 8 Tolstoy texts by 2 translators, and 7 Turgenev texts by 4 translators. These texts were chosen so that no translations by the same translator for the same author appear in both training and test sets. Thus the task is to attribute a set of test texts to the original author when the translators of the training texts by that author are different from the translators of the test texts by that author. The results on this test (same settings as the previous test) are naturally less accurate, but NSC classification is still 85.8% accurate (755 of 880 correct attributions) and SVM 87.6% (771 of 880 correct attributions). This seems almost incredible: the original author of a set of English translations by one group of translators is usually correctly identified as the author of a different set of that author's texts translated by a different set of translators.
In spite of the strength of the author's signal, however, further analysis shows that the translator can be made visible again by filtering out the author's signal. Consider a different kind of test. The training set contains 6 translations of Tolstoy by Garnett and 5 translations of Dostoevsky by Pevear and Volokhonsky (Garnett is treated as the author of Tolstoy and Pevear and Volokhonsky as the author of Dostoevsky). The test set contains 33 texts: 10 translations of Chekhov, 1 of Goncharov, and 9 of Turgenev by Garnett, and 13 translations of Gogol by Pevear and Volokhonsky. With authorship neutralized, the translator becomes startlingly visible again. On these tests (same stipulations as above), NSC is 81.2% accurate (536 of 660 correct attributions) and SVM 93.9% (620 of 660 correct attributions). Clearly Garnett's translations of Tolstoy are similar enough to her translations of Chekhov, Goncharov, and Turgenev that she can readily be identified as their "author." The same is true of the translations of Dostoevsky and Gogol by Pevear and Volokhonsky.
A final test can begin to show how this is possible. Zeta analysis identifies the characteristic vocabulary of these two translators–words consistently used by each and avoided by the other (Burrows 2002). It contrasts two groups of texts by measuring the consistency of inclusion and exclusion of a large set of words in large groups of sections of text of the same size by the two translators. For this test, Garnett's translations of Chekhov and Turgenev are treated as her "authorial" set and the Pevear and Volokhonsky translations of Dostoevsky and Tolstoy as their "authorial" set. An initial analysis showed that many proper names appeared in the characteristic vocabulary, and that British vs. American spellings and Garnett's use of hyphenated forms of words like
to-day, to-morrow, to-night, etc. had a significant effect, so I manually culled out more than 4,000 such words and retested, with the result shown in Fig. 3. Given the proven strength of the author's signal, Fig. 3 makes an important point. None of the Garnett Ind. Sections or P and V Ind. Sections influenced the distinction between the two translators, and all these texts are by Gogol. Many of them (in bold) are translations of the same work. Nevertheless, they are easily placed near the texts by their translator and separate from each other.

Fig. 3 Zeta Analysis of Garnett vs. Pevear and Volokhonsky
The 40 most distinctively characteristic words for the two translators shows some interesting patterns:
Consistently used by Garnett and avoided by Pevear and Volokhonsky:
till, fancy, passed, drawing-room, upon, air, flung, answered, muttered, walked, scarcely, cap, sound, slowly, hair, expression, hardly, every, fellow, near, silence, instant, distance, white, low, soft, bent, walking, deal, sky, grew, poor, shoulders, lips, fond, rather, dark, ought, haste, country, black, faint, beside, suppose, window, observed, continually, clever, creature, sank
Consistently used by Pevear and Volokhonsky and avoided by Garnett:
therefore, everyone, precisely, also, finally, I'll, despite, maybe, became, anyone, especially, decided, terribly, you're, having, start, impossible, I'm, unable, obviously, main, I'd, someone, contrary, he's, moment, until, started, order, situation, I've, didn't, because, terrible, firmly, front, silently, purpose, earlier, otherwise, immediately, certain, understood, let's, barely, they're, lit, you'll, former, you've
The words
till for Garnett and
until for Pevear and Volokhonsky are a classic authorship pair. Pevear and Volokhonsky clearly use a less formal style, as indicated by the large number of contractions among their markers. They also use more -ly adverbs, with nine in the list above compared to only four for Garnett, and only they have indefinite pronouns in their list (a trend that continues far beyond the 40 most distinctive words). By contrast, Garnett's list contains many concrete nouns, while Pevear and Volokhonsky's list contains none. It also contains many more full verbs and adjectives than Pevear and Volokhonsky's. There is no space here to investigate these differences fully, but this analysis suggests new ways to study the elusive signal of the translator.

The seeming paradox of the invisible translator can be resolved: although the strength of the author's signal normally renders the translator's individual style invisible, the translator's own signal is quite strong enough to allow the attribution of translations to their translators once the author's signal is eliminated.

Bibliography
Burrows, J. (2002). The Englishing of Juvenal: computational stylistics and translated texts.
Style, 36(4): 677-99.

Eder, M., Rybicki, J., and Kestemont, M. (2016). Stylometry with R: A package for computational text analysis.
R Journal, 8(1): 107-121.

Jockers, M. (2013).
Macroanalysis: Digital Methods and Literary History. Urbana-Champaigne: University of Illinois Press.

Rybicki, J. (2006). Burrowing into translation: Character idiolects in Henryk Sienkiewicz’s trilogy and its two English translations.
Literary and Linguistic Computing, 21(1): 91-103.

Rybicki, J. (2012). The great mystery of the (almost) invisible translator: Stylometry in translation. In
Quantitative Methods in Corpus-Based Translation Studies : A practical guide to descriptive translation research, edited by Michael P. Oakes, M., Ji, M., John Benjamins, pp. 231-48.

Rybicki, J., and Heydel M. (2013). The stylistics and stylometry of collaborative translation: Woolf’s
Night and Day in Polish.
Literary and Linguistic Computing, 28(4): 708-17.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019
"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO