Predicting new words from newer words: Lexical borrowings in French

  1. 1. Paula Horwath Chelsey

    University of Minnesota

  2. 2. R. Harald Baayen

    University of Alberta

This study models the integration of new lexical
borrowings into French, a language in which new
lexical borrowings are common. Our goal is to predict
whether or not a new lexical borrowing will “survive”
the onslaught of time and be integrated into French.
In linguistics, most theories of word formation have been
conducted in the generative tradition, such as those taken
by Aronoff (1976), Selkirk (1982), Halle & Marantz
(1993), and Ussishkin (2005). These approaches work
well for new words formed by affixation and address, for
example, how to form the neologism hateable from hate
according to the same rules from which we have love -->
loveable. Yet these theories have not addressed the productivity
of borrowings. Although borrowings may have
internal morphological structure in the donor language,
their adoption in French is not governed by structural
rules as studied in theoretical morphology. The goal of
the present study is to address the non-structural factors
that codetermine whether a borrowing will find its way
into the vocabulary of the recipient language.
Although many words from other languages enjoy
ephemeral use, the borrowings that become entrenched
in the language are a highly constrained subset of the
possible borrowings: new words do not occur indiscriminately.
Several factors may promote entrenchment in
the recipient language’s lexicon.
First, the DONOR LANGUAGE of a borrowing may
play a role in lexical integration. For example, borrowings
from a prestigious language like English could be
more likely to be integrated into the French lexicon than
borrowings from a less prestigious language like Polish.
Second, a borrowing’s FREQUENCY at a given moment
in time could be an influential predictor about the borrowing’s
integration into the language at a later point in time.
Third, a borrowing’s DISPERSION—the number of different
text chunks a word occurs in if a text is divided predictor. The more writers/speakers use a borrowing,
the greater likelihood it has of becoming entrenched in
the language community. Fourth, since shorter borrowings
require less processing effort, we hypothesize that
the LENGTH of the borrowing will be inversely related
to the degree of integration of a borrowing. Fifth, the
SENSE PATTERN (monosemy or polsemy) of aborrowing
in the recipient language may also be at issue. A semantically
rich borrowing might have a greater chance
of surviving than a semantically and contextually highly
restricted, specialized, borrowing. Finally, we consider
as well a cultural context factor, whether or not the borrowing
refers to a culture that typically corresponds to
the language of the borrowing. It is possible that a culturally
unrestricted borrowing, for example, a Russian borrowing
when describing China, could indicate a greater
degree of integration than a restricted cultural context in
which a Russian borrowing describes Russia. This study gathered initial new borrowings from the Le
Monde corpus (Abeillé et al. 2003) from 1989–1992.
We alsoqueried the online archives of Le Figaro for the
borrowings from 1996–2006, taking occurrence in this
second corpus as a proxy for integration into the French
lexicon. Given the frequency, the dispersion, the length,
the donor language of the borrowing, the borrowing’s
sense pattern and its cultural context in Le Monde, we
developed a multiple regression model predicting the
frequency of occurrence of the borrowing in the later
Le Figaro corpus. Our model succeeded in explaining a
high proportion of the variance in the Le Figaro frequencies
(R2 =0.673, with minimal overfitting as evidenced
by bootstrap validation). Table 1 summarizes this model.
Table 1 shows a highly significant main effect for dispersion.
The effect of dispersion is modulated by an interaction
with frequency, indicating the role of frequency is
restricted to words that have a broad dispersion. Comparing
frequency and dispersion, dispersion emerges as
the pre-eminent predictor for integration into the lexicon.
Length, operationalized in terms of number of syllables,
emerged with a negative slope, as expected. The effect
of length depended on the cultural context. In culturally
unrestricted contexts, longer borrowings are less likely
to be integrated into the lexicon. Polysemous borrowings,
borrowings with another sense already existing in
the language, are also more likely to be integrated into
the lexicon than borrowings that do not have another
sense. Finally, the cultural context variable turned out to
modulate the effect of most other predictors (frequency,
length, sense, and donor language).
We have documented a range of factors that codetermine
the acceptance of borrowings in a new language. These
factors may play a role not only for the entrenchment
of borrowings, but also for the entrenchment of regular
morphologically complex neologisms, complementing
the structure-directed investigations of theoretical
morphology. An important direction for future research
is to investigate whether, and if so how, the weights of
the factors documented here are modulated by the internal
structure of complex words (across affixed words,
blends, and acronyms).
Unlike other types of word formation, borrowings allow
us to gauge the degree of interaction between cultures.
The cultural context factor in the present study, for instance,
suggests that borrowings can be used to trace
how concepts from dominant cultures establish themselves
in the language community and spread to those
contexts where subordinate cultures are in focus. This
information is not only of use to linguists, but also to
sociologists and anthropologists.
The methodology outlined in the present study may also
be of use for lexicography, as it makes it possible to predict
which borrowings (and other neologisms) are in the
process of becoming entrenched in the language community,
and therefore merit inclusion in dictionaries. References Abeillé Clément, & F. Toussenel. 2003. Building a
treebank for french. In Treebanks: Building and Using
Parsed Corpora, 165–188. Kluwer Academic Publishers.
Aronoff, Mark. 1976. Word Formation in Generative
Grammar. Cambridge, Mass.: MIT Press.
Halle, M.,& A. Marantz.1993. Distributed morphology
and the pieces of inflection. In The View from Building
20: Essays in Linguistics in Honor of Sylvain Bromberger,
ed. by K. Hale & S. J. Keyser, volume 24 of Current
Studies in Linguistics, 111–176. Cambridge, Mass: MIT
Selkirk, E. 1982. The Syntax of Words. Cambridge: The
MIT Press.
Ussishkin, A. 2005. A Fixed Prosodic Theory of Nonconcatenative
Templatic Morphology. Natural Language &
Linguistic Theory 23.169–218.

