Chasing the Ghosts of Ibsen: A computational stylistic analysis of drama in translation

  1. 1. Gerard Lynch

    Trinity College Dublin

  2. 2. Carl Vogel

    Trinity College Dublin

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1 Introduction Research into the stylistic properties of translations is
an issue which has received some attention in computational
stylistics. Previous work by Rybicki (2006)
on the distinguishing of character idiolects in the work of
Polish author Henryk Sienkiewicz and two corresponding
English translations using Burrow’s Delta method
concluded that idiolectal differences could be observed
in the source texts and this variation was preserved to a
large degree in both translations. This study also found
that the two translations were also highly distinguishable
from one another.
Burrows (2002) examined English translations of Juvenal
also using the Delta method, results of this work suggest
that some translators are more adept at concealing
their own style when translating the works of another
author whereas other authors tend to imprint their own
style to a greater extent on the work they translate.
Our work examines the writing of a single author, Norwegian
playwright Henrik
Ibsen, and these writings
translated into both German and English from Norwegian,
in an attempt to investigate the preservation of
characterization, defined here as the distinctiveness of
textual contributions of characters.
2 Background
Many studies in computational stylistics have focused on
tasks which are related to those of authorship attribution
but are not concerned with the notion of attributing
to texts of unknown provenance. A related area
of study is the idea of pastiche, an intended imitation of
an author’s style in the same language, which contrasts
with translation as an intended imitation of an authors
style but in a different language. Somers and Tweedie
(2003) conducted experiments involving pastiche, the
author in question was Lewis Carroll and the pastiche was a modern children’s fable written by Gilbert Adair
called Alice
through the Needle’s Eye in which the author
attempted to imitate the style of Carroll in such works
as Through the Looking Glass and Alice’s Adventures
in Wonderland. Various techniques used in authorship
attribution were used in the task, including methods of
lexical richness, principal component analysis, the cusum
technique1, and others. Some methods distinguished
the pastiche from the original and some did not. Somers
and Tweedie (2003) conclude as follows: If a pastiche
is indistinguishable from the original by an authorship
attribution method, can it be said that the pastiche is in
fact a perfect imitation of the original,
or is it the case
flawed? In the case of translation which is of relevance to
our current work, the question can be formulated in a different
way: If a translation
is highly similar stylistically
to other works by the same translator, is the translation
a faithful one?
This current study builds on previous work detecting
character voices in the poetry of Irish poet Brendan Kennelly
by Vogel and Brisset (2007) and a study on characterization
in playwrights by Vogel and Lynch (2008).
These studies were concerned with the language used by
authors in the creation of character. The tools used in this
study were used in these previous studies.
3 Experimental Setup
For these experiments, three works by Henrik Ibsen were
used, A Doll’s House (1879) Ghosts (1881), and The
Master Builder (1892)2 . The electronic versions of these
plays were obtained from Ibsen.net3 and Project Gutenberg.
The contributions
of each character are extracted
using PlayParser4 . All stage instructions are discarded in
this step, leaving only the remaining character dialogue.
The method decomposes all texts associated with a category
(here, persona or play) into chunks of equal size.
Pairwise similarity metrics are computed for all chunks.
The metric is just the average chi-square computation of
the difference in distribution
between pairs of fi les for
each token appearing in either fi le. Different sorts of tokenization
capture different linguistic features for which
one might consider distributions within and across text
categories. If the pairwise similarity scores are rank ordered,
then one can exploit the intuitions that a homogeneous
will have a smaller rank-sum than a
heterogeneous one, and that arbitrary samples from a
homogeneous category should be more like the rest of
that category
than alternative categories. The method
also provides a way to measure degree of homogeneity,
the number of samples who are more like the rest of
the category can be measured against a baseline creating
by random sampling. See Vogel and Lynch (2008) for a
more detailed account of the method.
4 Experiments
4.1 First Experiment
The fi rst experiment seeks to compare character homogeneity
over different languages.
The second experiment
compares two different translations of the same
play in order to quantify similarity between parallel
translations. Table 1 shows the plays and their respective
translators. As mentioned, the fi rst 10k of text per
character was examined and this was split into 5 sections.
Thus, the criteria for inclusion in the study
was that the character should contain at least 10k of
text and 11 characters were examined, as detailed in
Table 2. Only the version of Ghosts translated by Archer
is used in the fi rst experiment. The results named
in the next section have statistical significance. The results for the fi rst experiment showed that character
homogeneity varies to some extent over the translations,
the character idiolects are not necessarily preserved to
the same degree as the originals. When letter frequencies
are measured,
the Norwegian original language characters
prove to be more homogeneous than the translations,
examples include the character of Engstrand who is homogeneous
in English and Norwegian but not German,
however, one character whose language remains distinct
across all of the translations is Nora, the heroine
from A
Doll’s House and one of the typical strong female characters
found in Ibsen’s drama.5 However, when the play
is taken as the category, we fi nd that the chunks of personas
from each play are more similar to the personas from
the same play than from different plays, and this is consistent
across languages. So while within character homogeneity
is not always preserved, the homogeneity of
the plays remains relatively consistent across languages.
5 The Second Experiment
The second experiment sought to examine whether two translations of the same original text into the same language
are distinguishable by translator as in the work
by Rybicki which delineated the work by each, while
observing the preservation
of idiolect in each. The experimental
setup was similar to the fi rst experiment with
the character contributions separated and split into fi ve
files each. This time, however, the characters from the
two translations of Ghosts by William Archer and Robert
Farquharson Sharp were compared with each other.
Our fi ndings were that the characters from Archer’s translation
were more homogeneous in general than those of
Sharp’s translation. Of the characters which were not
homogeneous, the text segments were more similar to
the segments of the same character by the corresponding
author than any other writings by the same author.
Sharp’s characters tended to be more similar to the corresponding
Archer character more often than vice versa.
This suggests that both authors have managed to perform
faithful translations which are not highly influenced by
their own writing style. It also suggests that Sharp may
have used Ibsen’s translation as a reference when crafting
his own.
Character Play
Engstrand Ghosts
Pastor Manders Ghosts
Oswald Ghosts
Mrs Alving Ghosts
Helmer A Dolls House
Krogstad A Dolls House
MrsLinde A Dolls House
Nora A Dolls House
Aline The Master Builder
Hilde The Master Builder
Solness The Master Builder
Table 2: Characters and their plays
This result contrasts with Rybicki (2006) who found that
the two translations
of Sienkiewicz separated cleanly
from one another with a preservation of individual character
idiolects. However, Rybicki makes clear that the
two English translations were done almost one hundred
years apart with the second translator taking specific
steps to bring the language of Sienkiewicz into the 20th
century. Also, we are aware that results between the
studies of two different authors are not directly comparable
and do not seek to draw definite parallels, merely to
reflect on related work in the same sphere.6
6 Conclusion
In this research, character idiolects in translation have
been examined. Future work will involve using different
metrics for comparison along with comparing different
selections of text from the characters considered, along
with the comparisons
of translations of different authors
by the same translator.
1See Farringdon (1996) for a detailed explanation of the
origins of this technique, including detailed examples of
the method’s use in a legal setting.
2For the English versions of the plays, the print versions
are collected in Ibsen, Archer, Aveling, Archer, and Archer
(1890), Sharp’s translations can be found in Sharp
(1911), the collected works of Ibsen in German are to
be found in Ibsen (1898) and the Norwegian collected
works are found in Ibsen and Bull (1957)
3, last verified March 12, 2009, contains
comprehensive information about Ibsen’s life and
work together with links to his plays in the original form
and in translation.
4A Java based tool designed for this purpose, Lynch and
Vogel (2007), describes the creation and benchmarking
of this particular program.
5Hedda Gabler being the other one who springs to mind,
further studies may incorporate a wider range of plays
and characters.
6It is not fully clear from any forewords to the e-texts
when exactly the translations themselves
were fi rst published,
however it does state that the fi rst performance in
English was in 1890, using Archers translation, Sharp’s
translations were fi rst published in 1911, according to
htm, last verified March 12, 2009
Burrows, J. (2002). The Englishing of Juvenal: Computational
Stylistics and Translated Texts. Style, 36 (4),
Farringdon, J. (1996). Analysing for Authorship: A guide
to the Cusum technique. University of Wales Press.
Ibsen, H. (1898). Henrik Ibsens sämtliche Werke in
deutscher Sprache. S. Fischer.
Ibsen, H., Archer, W., Aveling, E., Archer, F., & Archer, C. (1890). Ibsen’s Prose Dramas. W. Scott.
Ibsen, H. & Bull, F. (1957). Samlede verker: hundreårsutgave.
Lynch, G. & Vogel, C. (2007). Automatic Character Assignation.
In Proceedings of AI-2007 Twenty-seventh
SGAI International Conference on Innovative Techniques
and Applications of Artificial Intelligence, pp.
Rybicki, J. (2006). Burrowing into Translation: Character
Idiolects in Henryk Sienkiewicz’s Trilogy and its
Two English Translations. Literary and Linguistic
21 (1), 91–103.
Sharp, R. (1911). Henrik Ibsen, Ghosts and Two Other
Plays. J.M Dent.
Somers, H. & Tweedie, F. (2003). Authorship Attribution
and Pastiche. Computers
and the Humanities, 37
(4), 407–429.
Vogel, C. & Brisset, S. (2007). Hearing Voices in the
Poetry of Brendan Kennelly. Belgian Journal of English
Language & Literature, 1–16.
Vogel, C. & Lynch, G. (2008). Computational Stylometry:
Who’s in a Play?. In Verbal and Nonverbal Features
of Human-Human and Human-Machine Interaction., pp.
189–194. Springer.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None