Creaturesof Habit? What collocation can tell us about translation

  1. 1. Dorothy Kenny

    Dublin City University

Keywords: collocation, translation, conventionalization

Corpora in Translation Studies
Baker (1995) describes various types of electronic corpora that are of specific interest to translation scholars. In Baker's terminology, a parallel corpus consists of texts originally written in a language A alongside their translations into a language B. Parallel corpora exist for several language pairs including English-French (Salkie 1995; and see also Church and Gale's work (1991) using the Canadian Hansard), English-Italian (Marinai et al. 1992), and English- Norwegian (Johansson and Hofland 1994; Johansson, et al. 1996). Parallel corpora can be used to provide information on language-pair specific translational behaviour, or to posit certain equivalence relationships between lexical items or structures in source and target languages (Marinai et al. 1992).Typical applications of parallel corpora include translator training, bilingual lexicography and machine translation.
Baker (1995) uses the term comparable corpus to describe a collection of texts originally written in a language, say English, alongside a collection of texts translated (from one or more languages) into English, and suggests that comparable corpora have the potential to reveal most about features specific to translated text, i.e., those features that occur exclusively, or with unusually low or high frequency, in translated text as opposed to other types of text production, and that cannot be traced back to the influence of any one particular source text or language. Translation theorists such as Shlesinger (1991), Toury (1980), Vanderauwera (1985) and Baker (1993) have posited the following as features of translated text: translated texts tend to be more explicit, less ambiguous, and grammatically and lexically more conventional than source texts or other texts produced in the target language.

Using collocation as an indicator of conservatism
The idea that translations are more conventional than their source texts or other target language texts can also be tested by investigating collocational patterns. Should familiar collocational patterns be somehow flouted in a source text, then the point at which this happens will also have special textemic status in that source text. This could occur, for example, at points where the preference of a word under investigation for collocates of a particular semantic set is not respected in a text, or, more specifically, where there are "departures ... from the expected profiles of semantic prosodies" (Louw 1993: 157). Corpus linguistics provides interesting techniques for spotting recurring and, by contrast, unconventional patterns of co-occurrence in vast quantities of text (Clear 1993; Louw 1993) and such techniques are being extended to bilingual corpora (Peters and Picchi 1996; Smadja et al. 1996). If translators really are under pressure to conform to target-language norms, one could expect unconventional co-occurrences in source texts to be replaced by more conventional collocations in the target text.
The current doctoral research represents an attempt to use collocation as an indicator of conservative tendencies amongst translators. It involves the building of a parallel corpus of contemporary German fiction translated into English. Unconventional lexical co-occurrences are to be identified in the German source texts, by comparing the source texts with a large reference corpus of German, and using the tools of collocation analysis (Clear 1993; Barnbrook 1996). The translation into English of such unusual lexical combinations will then be investigated to see whether these are conventionalized in any way. Such conventionalization can, of course, only be established with reference to a large corpus of fiction originally written in English, in other words, using a comparable corpus.

A pilot investigation
This poster sets out specifically to report on a pilot test designed to investigate collocational patterns in a small number of German source and English target texts. The principle issues at stake are: how to choose node words worth investigating in the original German texts; and how to identify statistically significant collocations and, by contrast, unusual co-occurrences in the source and target texts. Various approaches are taken in the literature: Stubbs (1996), for example, investigates the collocates of culturally significant nodes; other researchers (Clear 1993; Smadja 1993) report on approaches that compute collocation patterns for every word form in a corpus from the outset, only to later jettison those combinations that fall below an arbitrary threshold of significance. It is also well known that different measures of statistical significance yield different results in automatic collocation recognition (Clear 1993; Smadja 1993). By comparing approaches, it is hoped that this pilot test will indicate how the research should proceed when it is scaled up to include the full set of German source texts. It is also intended to reveal problems that may be specific to the identification of collocations in two different languages, and specifically, whether unconventional (free) lexical combinations can be fruitfully used as a springboard for investigating conservative linguistics tendencies among literary translators.
