1 Introduction Research into the stylistic properties of translations is
an issue which has received some attention in computational
stylistics. Previous work by Rybicki (2006)
on the distinguishing of character idiolects in the work of
Polish author Henryk Sienkiewicz and two corresponding
English translations using Burrow’s Delta method
concluded that idiolectal differences could be observed
in the source texts and this variation was preserved to a
large degree in both translations. This study also found
that the two translations were also highly distinguishable
from one another.
Burrows (2002) examined English translations of Juvenal
also using the Delta method, results of this work suggest
that some translators are more adept at concealing
their own style when translating the works of another
author whereas other authors tend to imprint their own
style to a greater extent on the work they translate.
Our work examines the writing of a single author, Norwegian
playwright Henrik
Ibsen, and these writings
translated into both German and English from Norwegian,
in an attempt to investigate the preservation of
characterization, defined here as the distinctiveness of
textual contributions of characters.
2 Background
Many studies in computational stylistics have focused on
tasks which are related to those of authorship attribution
but are not concerned with the notion of attributing
to texts of unknown provenance. A related area
of study is the idea of pastiche, an intended imitation of
an author’s style in the same language, which contrasts
with translation as an intended imitation of an authors
style but in a different language. Somers and Tweedie
(2003) conducted experiments involving pastiche, the
author in question was Lewis Carroll and the pastiche was a modern children’s fable written by Gilbert Adair
called Alice
through the Needle’s Eye in which the author
attempted to imitate the style of Carroll in such works
as Through the Looking Glass and Alice’s Adventures
in Wonderland. Various techniques used in authorship
attribution were used in the task, including methods of
lexical richness, principal component analysis, the cusum
technique1, and others. Some methods distinguished
the pastiche from the original and some did not. Somers
and Tweedie (2003) conclude as follows: If a pastiche
is indistinguishable from the original by an authorship
attribution method, can it be said that the pastiche is in
fact a perfect imitation of the original,
or is it the case
flawed? In the case of translation which is of relevance to
our current work, the question can be formulated in a different
way: If a translation
is highly similar stylistically
to other works by the same translator, is the translation
a faithful one?
This current study builds on previous work detecting
character voices in the poetry of Irish poet Brendan Kennelly
by Vogel and Brisset (2007) and a study on characterization
in playwrights by Vogel and Lynch (2008).
These studies were concerned with the language used by
authors in the creation of character. The tools used in this
study were used in these previous studies.
3 Experimental Setup
For these experiments, three works by Henrik Ibsen were
used, A Doll’s House (1879) Ghosts (1881), and The
Master Builder (1892)2 . The electronic versions of these
plays were obtained from Ibsen.net3 and Project Gutenberg.
The contributions
of each character are extracted
using PlayParser4 . All stage instructions are discarded in
this step, leaving only the remaining character dialogue.
The method decomposes all texts associated with a category
(here, persona or play) into chunks of equal size.
Pairwise similarity metrics are computed for all chunks.
The metric is just the average chi-square computation of
the difference in distribution
between pairs of fi les for
each token appearing in either fi le. Different sorts of tokenization
capture different linguistic features for which
one might consider distributions within and across text
categories. If the pairwise similarity scores are rank ordered,
then one can exploit the intuitions that a homogeneous
will have a smaller rank-sum than a
heterogeneous one, and that arbitrary samples from a
homogeneous category should be more like the rest of
that category
than alternative categories. The method
also provides a way to measure degree of homogeneity,
the number of samples who are more like the rest of
the category can be measured against a baseline creating
by random sampling. See Vogel and Lynch (2008) for a
more detailed account of the method.
4 Experiments
4.1 First Experiment
The fi rst experiment seeks to compare character homogeneity
over different languages.
The second experiment
compares two different translations of the same
play in order to quantify similarity between parallel
translations. Table 1 shows the plays and their respective
translators. As mentioned, the fi rst 10k of text per
character was examined and this was split into 5 sections.
Thus, the criteria for inclusion in the study
was that the character should contain at least 10k of
text and 11 characters were examined, as detailed in
Table 2. Only the version of Ghosts translated by Archer
is used in the fi rst experiment. The results named
in the next section have statistical significance. The results for the fi rst experiment showed that character
homogeneity varies to some extent over the translations,
the character idiolects are not necessarily preserved to
the same degree as the originals. When letter frequencies
are measured,
the Norwegian original language characters
prove to be more homogeneous than the translations,
examples include the character of Engstrand who is homogeneous
in English and Norwegian but not German,
however, one character whose language remains distinct
across all of the translations is Nora, the heroine
from A
Doll’s House and one of the typical strong female characters
found in Ibsen’s drama.5 However, when the play
is taken as the category, we fi nd that the chunks of personas
from each play are more similar to the personas from
the same play than from different plays, and this is consistent
across languages. So while within character homogeneity
is not always preserved, the homogeneity of
the plays remains relatively consistent across languages.
5 The Second Experiment
The second experiment sought to examine whether two translations of the same original text into the same language
are distinguishable by translator as in the work
by Rybicki which delineated the work by each, while
observing the preservation
of idiolect in each. The experimental
setup was similar to the fi rst experiment with
the character contributions separated and split into fi ve
files each. This time, however, the characters from the
two translations of Ghosts by William Archer and Robert
Farquharson Sharp were compared with each other.
Our fi ndings were that the characters from Archer’s translation
were more homogeneous in general than those of
Sharp’s translation. Of the characters which were not
homogeneous, the text segments were more similar to
the segments of the same character by the corresponding
author than any other writings by the same author.
Sharp’s characters tended to be more similar to the corresponding
Archer character more often than vice versa.
This suggests that both authors have managed to perform
faithful translations which are not highly influenced by
their own writing style. It also suggests that Sharp may
have used Ibsen’s translation as a reference when crafting
his own.
Character Play
Engstrand Ghosts
Pastor Manders Ghosts
Oswald Ghosts
Mrs Alving Ghosts
Helmer A Dolls House
Krogstad A Dolls House
MrsLinde A Dolls House
Nora A Dolls House
Aline The Master Builder
Hilde The Master Builder
Solness The Master Builder
Table 2: Characters and their plays
This result contrasts with Rybicki (2006) who found that
the two translations
of Sienkiewicz separated cleanly
from one another with a preservation of individual character
idiolects. However, Rybicki makes clear that the
two English translations were done almost one hundred
years apart with the second translator taking specific
steps to bring the language of Sienkiewicz into the 20th
century. Also, we are aware that results between the
studies of two different authors are not directly comparable
and do not seek to draw definite parallels, merely to
reflect on related work in the same sphere.6
6 Conclusion
In this research, character idiolects in translation have
been examined. Future work will involve using different
metrics for comparison along with comparing different
selections of text from the characters considered, along
with the comparisons
of translations of different authors
by the same translator.
