Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)
In this work the appearance of the newly introduced words
of The Da Vinci Code and its different translations have been
analyzed and compared. The concept “newly introduced words”
refers to the words whose fi rst appearance is detected at a
certain point of the text. In general the number of the newly
introduced words follows a monotonic decay, however, there
are segments of the texts where this descending is reversed
and a sudden increase is detectable (Csernoch, 20066a,
2006b, 2007) – these text slices are referred to as vocabulary
rich text slices. The question arises whether the detectable,
unexpectedly high number of newly introduced words of the
original work is traceable in the different translations of the
text or not.
Before advancing on the project let us defi ne the concept of
translation. In this context the defi nition of Hatim and Mason
(1993) is accepted, that is any adaptation of a literary work
is considered as translation. Beyond the foreign language
translations of The Da Vinci Code two more adaptations, the
lemmatized and the condensed versions of the original work
were analyzed.
The comparison of the newly introduced word types and
lemmas of the different translations allows us to trace how
precisely translation(s) follow(s) the changes in the vocabulary
of the original text. Whether the vocabulary rich sections of
the original text appear similarly rich in the translations, or the
translations swallow them up, or the translations are richer
at certain sections than the original work. In addition, could
the changes in the newly introduced words, as an additional
parameter to those already applied, be used to give a hint of
the quality of a translation?
To carry out the analysis a previously introduced method
was applied (Csernoch, 2006a, 2006b). The text was divided
into constant-length intervals, blocks. The number of the
newly introduced words was mapped to each block. Those
text segments were considered vocabulary rich, in which the
number of the newly introduced words signifi cantly exceeds
that predicted by a fi rst-order statistical model.
The original The Da Vinci Code
In the original The Da Vinci Code eleven signifi cant, vocabulary
rich, text slices were detected. This number is similar to what
was found in other, previously analyzed works (Csernoch,
2007). However, a further analysis of these text slices was
applied to reveal their nature. Four parameters,
– the distribution of these text slices within the text,
– their length – the number of blocks in the signifi cant text
slice,
– their intensity – the local maximum of the relative number
of the newly introduced words, and
– their content were determined.
The majority of the signifi cant text slices of The Da Vinci Code
both in lengths and intensity turned out to be unusually small,
containing descriptions of events, characters and places. None
of them stands for stylistic changes which usually (Baayen,
2001) trigger extremely vocabulary rich text slices. The
distribution of them is uneven, they mainly appear in the fi rst
half of the text. This means that the second half of the novel
hardly introduces any more vocabulary items that predicted
by a fi rst-order statistical model.
The analysis of the lemmatized texts
To see whether the word types, with all their suffi xes, are
responsible for any losses of vocabulary rich text slices, the
lemmatization, as an adaptation of the text, was carried out.
As it was found earlier (Csernoch, 2006b), the lemmatization
of the text produced minor, if any, differences in the analysis
of the word types in an English text. To the English nonlemmatized
and lemmatized The Da Vinci Code their Hungarian
correspondences were compared. Unlike the English texts, at
the beginning of the Hungarian texts the number of newly
introduced word types was so high that up to the fi rst two
hundred blocks (20,000 tokens), some of the text slices with
signifi cantly high number of newly introduced word types
might be swallowed up.
The foreign language translations of
The Da Vinci Code
While the absolute numbers of the newly introduced words of
texts in different languages cannot, their relative numbers can
be compared using the method outlined in Csernoch (2006a).
Relying on the advantages of the method three different
translations were compared to the original text, the Hungarian,
the German, and the French. If the vocabulary rich text slices
are found at the same positions both in the original and in
the translated text, the translation can then be considered as
exact in respect of vocabulary richness. In the Hungarian translation the lengths and the intensities
of the vocabulary rich text slices were not altered, however
their distribution was more even than in the English text, and
their number increased to sixteen. While the vocabulary rich
text slices of the English text were all found in the Hungarian
text, further such text slices were identifi ed in the second
half of the Hungarian text. The comparison revealed that the
Hungarian text is richer in vocabulary than the English text.
The German translation replicated the vocabulary rich text
slices of the original English text, and provided fi ve more,
which, similarly to the Hungarian translation, means that these
text slices are richer in vocabulary than the corresponding
text slices in the English text.
In the French translation only nine signifi cant text slices were
found. All of these text slices were only one block long, and
their intensities were also surprisingly small. This means that
the distribution of the newly introduced words in the French
translation hardly differs from that predicted by the model.
Furthermore, they are quite different in content from the
vocabulary rich text slices in the other languages. Thus, there
are hardly any concentrated, vocabulary rich text slices in the
French text.
The condensed versions of The Da Vinci
Code
Finally, the condensed versions of the English and Hungarian
texts were compared to the corresponding full-length text
and to each other. Several questions are to be answered in
this context. By the nature of this adaptation it is obvious that
the length of the text is curtailed to some extent. However,
this parameter does not tell much about the nature of the
condensation. We do not know from this parameter whether
the condensation is only a cropping of the original text –
certain text segments are left out while others are untouched
– or the whole text is modifi ed to some extent. If the text
is modifi ed, the percentage of the remaining tokens, word
types, lemmas, and hapax legomena are parameters which tell
us more about the condensation. To get a further insight it is
worth considering how the vocabulary rich text segments of
the original text are transferred into the condensed text. This
last parameter might be a great help in deciding from which
fi rst order adaptation a second order adaptation of a text – in
this case the condensed Hungarian text – is derived.
Both the English and the Hungarian condensed texts are 45%
of the original texts in length. The number of word types is
64 and 55%, the number of lemmas are 64 and 61%, while
the number of hapax legomena is 70 and 56% of the English
and Hungarian full-length texts, respectively. These parameters
indicate that the Hungarian condensed text bore more serious
damage than the English did. The number of vocabulary rich
text segments dropped to six – somewhat more than the half
of original number – in the English text. On the other hand, the
number of these text segments in the Hungarian text dropped
to one-third, which is a notable difference compared to the
full-length Hungarian text. Both in the condensed English
and Hungarian texts the vocabulary rich segments were
concentrated to the fi rst half of the texts representing the
same events, none of the segments unique to the full-length
Hungarian text appeared in the condensed Hungarian text. The
cumulative length of the vocabulary rich text slices dropped
to 51% in the English and to 43% in the Hungarian text. Again,
the Hungarian text seemed to be damaged to a greater extent.
All the analyzed parameters thus clearly indicate that for the
condensed Hungarian version the condensed English text was
the direct source.
Bibliography
Baayen, R. H. (2001) Word Frequency Distributions. Kluwer
Academic Publishers, Dordrecht, Netherlands
Csernoch, M. (2006a) The introduction of word types and
lemmas in novels, short stories and their translations. http://
www.allc-ach2006.colloques.paris-sorbone.fr/DHs.pdf. Digital
Humanities 2006. The First International Conference of the
Alliance of Digital Humanities Organisations. (5-9 July 2006,
Paris)
Csernoch, M. (2006b) Frequency-based Dynamic Models for
the Analysis of English and Hungarian Literary Works and
Coursebooks for English as a Second Language. Teaching
Mathematics and Computer Science. Debrecen, Hungary
Csernoch, M. (2007) Seasonalities in the Introduction of
Word-types in Literary Works. Publicationes Universitatis
Miskolcinensis, Sectio Philosophica, Tomus XI. – Fasciculus 3.
Miskolc 2006-2007.
Hatim, B. and Mason, I. (1993) Discourse and the Translator.
Longman Inc., New York.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at University of Oulu
Oulu, Finland
June 25, 2008 - June 29, 2008
135 works by 231 authors indexed
Conference website: http://www.ekl.oulu.fi/dh2008/
Series: ADHO (3)
Organizers: ADHO