1 Introduction Research into the stylistic properties of translations is
an issue which has received some attention in computational
stylistics. Previous work by Rybicki (2006)
on the distinguishing of character idiolects in the work of
Polish author Henryk Sienkiewicz and two corresponding
English translations using Burrow’s Delta method
concluded that idiolectal differences could be observed
in the source texts and this variation was preserved to a
large degree in both translations. This study also found
that the two translations were also highly distinguishable
from one another.
Burrows (2002) examined English translations of Juvenal
also using the Delta method, results of this work suggest
that some translators are more adept at concealing
their own style when translating the works of another
author whereas other authors tend to imprint their own
style to a greater extent on the work they translate.
Our work examines the writing of a single author, Norwegian
Ibsen, and these writings
translated into both German and English from Norwegian,
in an attempt to investigate the preservation of
characterization, deﬁned here as the distinctiveness of
textual contributions of characters.
Many studies in computational stylistics have focused on
tasks which are related to those of authorship attribution
but are not concerned with the notion of attributing
to texts of unknown provenance. A related area
of study is the idea of pastiche, an intended imitation of
an author’s style in the same language, which contrasts
with translation as an intended imitation of an authors
style but in a different language. Somers and Tweedie
(2003) conducted experiments involving pastiche, the
author in question was Lewis Carroll and the pastiche was a modern children’s fable written by Gilbert Adair
through the Needle’s Eye in which the author
attempted to imitate the style of Carroll in such works
as Through the Looking Glass and Alice’s Adventures
in Wonderland. Various techniques used in authorship
attribution were used in the task, including methods of
lexical richness, principal component analysis, the cusum
technique1, and others. Some methods distinguished
the pastiche from the original and some did not. Somers
and Tweedie (2003) conclude as follows: If a pastiche
is indistinguishable from the original by an authorship
attribution method, can it be said that the pastiche is in
fact a perfect imitation of the original,
or is it the case
ﬂawed? In the case of translation which is of relevance to
our current work, the question can be formulated in a different
way: If a translation
is highly similar stylistically
to other works by the same translator, is the translation
a faithful one?
This current study builds on previous work detecting
character voices in the poetry of Irish poet Brendan Kennelly
by Vogel and Brisset (2007) and a study on characterization
in playwrights by Vogel and Lynch (2008).
These studies were concerned with the language used by
authors in the creation of character. The tools used in this
study were used in these previous studies.
3 Experimental Setup
For these experiments, three works by Henrik Ibsen were
used, A Doll’s House (1879) Ghosts (1881), and The
Master Builder (1892)2 . The electronic versions of these
plays were obtained from Ibsen.net3 and Project Gutenberg.
of each character are extracted
using PlayParser4 . All stage instructions are discarded in
this step, leaving only the remaining character dialogue.
The method decomposes all texts associated with a category
(here, persona or play) into chunks of equal size.
Pairwise similarity metrics are computed for all chunks.
The metric is just the average chi-square computation of
the difference in distribution
between pairs of ﬁ les for
each token appearing in either ﬁ le. Different sorts of tokenization
capture different linguistic features for which
one might consider distributions within and across text
categories. If the pairwise similarity scores are rank ordered,
then one can exploit the intuitions that a homogeneous
will have a smaller rank-sum than a
heterogeneous one, and that arbitrary samples from a
homogeneous category should be more like the rest of
than alternative categories. The method
also provides a way to measure degree of homogeneity,
the number of samples who are more like the rest of
the category can be measured against a baseline creating
by random sampling. See Vogel and Lynch (2008) for a
more detailed account of the method.
4.1 First Experiment
The ﬁ rst experiment seeks to compare character homogeneity
over different languages.
The second experiment
compares two different translations of the same
play in order to quantify similarity between parallel
translations. Table 1 shows the plays and their respective
translators. As mentioned, the ﬁ rst 10k of text per
character was examined and this was split into 5 sections.
Thus, the criteria for inclusion in the study
was that the character should contain at least 10k of
text and 11 characters were examined, as detailed in
Table 2. Only the version of Ghosts translated by Archer
is used in the ﬁ rst experiment. The results named
in the next section have statistical signiﬁcance. The results for the ﬁ rst experiment showed that character
homogeneity varies to some extent over the translations,
the character idiolects are not necessarily preserved to
the same degree as the originals. When letter frequencies
the Norwegian original language characters
prove to be more homogeneous than the translations,
examples include the character of Engstrand who is homogeneous
in English and Norwegian but not German,
however, one character whose language remains distinct
across all of the translations is Nora, the heroine
Doll’s House and one of the typical strong female characters
found in Ibsen’s drama.5 However, when the play
is taken as the category, we ﬁ nd that the chunks of personas
from each play are more similar to the personas from
the same play than from different plays, and this is consistent
across languages. So while within character homogeneity
is not always preserved, the homogeneity of
the plays remains relatively consistent across languages.
5 The Second Experiment
The second experiment sought to examine whether two translations of the same original text into the same language
are distinguishable by translator as in the work
by Rybicki which delineated the work by each, while
observing the preservation
of idiolect in each. The experimental
setup was similar to the ﬁ rst experiment with
the character contributions separated and split into ﬁ ve
ﬁles each. This time, however, the characters from the
two translations of Ghosts by William Archer and Robert
Farquharson Sharp were compared with each other.
Our ﬁ ndings were that the characters from Archer’s translation
were more homogeneous in general than those of
Sharp’s translation. Of the characters which were not
homogeneous, the text segments were more similar to
the segments of the same character by the corresponding
author than any other writings by the same author.
Sharp’s characters tended to be more similar to the corresponding
Archer character more often than vice versa.
This suggests that both authors have managed to perform
faithful translations which are not highly inﬂuenced by
their own writing style. It also suggests that Sharp may
have used Ibsen’s translation as a reference when crafting
Pastor Manders Ghosts
Mrs Alving Ghosts
Helmer A Dolls House
Krogstad A Dolls House
MrsLinde A Dolls House
Nora A Dolls House
Aline The Master Builder
Hilde The Master Builder
Solness The Master Builder
Table 2: Characters and their plays
This result contrasts with Rybicki (2006) who found that
the two translations
of Sienkiewicz separated cleanly
from one another with a preservation of individual character
idiolects. However, Rybicki makes clear that the
two English translations were done almost one hundred
years apart with the second translator taking speciﬁc
steps to bring the language of Sienkiewicz into the 20th
century. Also, we are aware that results between the
studies of two different authors are not directly comparable
and do not seek to draw deﬁnite parallels, merely to
reﬂect on related work in the same sphere.6
In this research, character idiolects in translation have
been examined. Future work will involve using different
metrics for comparison along with comparing different
selections of text from the characters considered, along
with the comparisons
of translations of different authors
by the same translator.
1See Farringdon (1996) for a detailed explanation of the
origins of this technique, including detailed examples of
the method’s use in a legal setting.
2For the English versions of the plays, the print versions
are collected in Ibsen, Archer, Aveling, Archer, and Archer
(1890), Sharp’s translations can be found in Sharp
(1911), the collected works of Ibsen in German are to
be found in Ibsen (1898) and the Norwegian collected
works are found in Ibsen and Bull (1957)
3http://www.ibsen.net, last veriﬁed March 12, 2009, contains
comprehensive information about Ibsen’s life and
work together with links to his plays in the original form
and in translation.
4A Java based tool designed for this purpose, Lynch and
Vogel (2007), describes the creation and benchmarking
of this particular program.
5Hedda Gabler being the other one who springs to mind,
further studies may incorporate a wider range of plays
6It is not fully clear from any forewords to the e-texts
when exactly the translations themselves
were ﬁ rst published,
however it does state that the ﬁ rst performance in
English was in 1890, using Archers translation, Sharp’s
translations were ﬁ rst published in 1911, according to
htm, last veriﬁed March 12, 2009
Burrows, J. (2002). The Englishing of Juvenal: Computational
Stylistics and Translated Texts. Style, 36 (4),
Farringdon, J. (1996). Analysing for Authorship: A guide
to the Cusum technique. University of Wales Press.
Ibsen, H. (1898). Henrik Ibsens sämtliche Werke in
deutscher Sprache. S. Fischer.
Ibsen, H., Archer, W., Aveling, E., Archer, F., & Archer, C. (1890). Ibsen’s Prose Dramas. W. Scott.
Ibsen, H. & Bull, F. (1957). Samlede verker: hundreårsutgave.
Lynch, G. & Vogel, C. (2007). Automatic Character Assignation.
In Proceedings of AI-2007 Twenty-seventh
SGAI International Conference on Innovative Techniques
and Applications of Artiﬁcial Intelligence, pp.
Rybicki, J. (2006). Burrowing into Translation: Character
Idiolects in Henryk Sienkiewicz’s Trilogy and its
Two English Translations. Literary and Linguistic
21 (1), 91–103.
Sharp, R. (1911). Henrik Ibsen, Ghosts and Two Other
Plays. J.M Dent.
Somers, H. & Tweedie, F. (2003). Authorship Attribution
and Pastiche. Computers
and the Humanities, 37
Vogel, C. & Brisset, S. (2007). Hearing Voices in the
Poetry of Brendan Kennelly. Belgian Journal of English
Language & Literature, 1–16.
Vogel, C. & Lynch, G. (2008). Computational Stylometry:
Who’s in a Play?. In Verbal and Nonverbal Features
of Human-Human and Human-Machine Interaction., pp.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at University of Maryland, College Park
College Park, Maryland, United States
June 20, 2009 - June 25, 2009
176 works by 303 authors indexed
Conference website: http://web.archive.org/web/20130307234434/http://mith.umd.edu/dh09/
Series: ADHO (4)