Building a Diachronic and Contrastive Parallel Corpus - and an Intended Application in the Form of a Study of Germanic Complex Verb Constructions

poster / demo / art installation
Authorship
  1. 1. Gerlof Bouma

    Göteborg University (Gothenburg)

  2. 2. Evie Coussé

    Göteborg University (Gothenburg)

  3. 3. Dirk-Jan de Kooter

    Meertens Instituut - Royal Netherlands Academy of Arts and Sciences (KNAW)

  4. 4. Nicoline van der Sijs

    Radboud University, Meertens Instituut - Royal Netherlands Academy of Arts and Sciences (KNAW)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
Our project
The rise of complex verb constructions in Germanic started in the autumn of 2018, and aims to investigate how the possibilities for combining several auxiliary verbs fold out over time in four Germanic languages. To aid in this, we are compiling a parallel corpus of Bible texts, with translations in different languages from around the same time, as well as translations from different stages of a language.

Intended linguistic application
Complex verb constructions combine two or more auxiliary verbs with a lexical verb. As an example of the kind of combinations involved, consider the following four sentences from contemporary Dutch, English, German and Swedish, respectively:

a)
Ik
moet
kunnen
komen.

b)
I
must
be able to
come.

c)
Ich
muss
kommen
können.

d)
Jag
måste
kunna
komma.

Note that the English example stands out by the use of the periphrastic
be able to in stead of a form of
can. The earliest attestations of complex verb constructions in our languages of interest are from the 13th century. Double modal auxiliaries are initially only headed by a form of ‘shall’. The following two examples are from Middle English and Middle Dutch.

e)
þatt
mannkinn
shollde
muȝhenn
wel
Upp
cumenn
inntill
heoffne

that
mankind
should
.3sg

may
.inf

well
upp
come
.inf

into
heaven

‘that mankind should be able to come into heaven.’ (Ormulum, ca. 1200)

f)
dat
deen
sonder
den andren
niet
daer towe
en
sal
moghen
gaen

that
the one
without
the other
not
to there
neg
shall
.
3sg

may
.inf

go
.inf

‘[so] that the one shall not be able/allowed to go there without the other.’
(Charter Brussels, 1277)

Complex verb constructions in contemporary Germanic are well-studied (see for instance, Den Besten & Edmondson, 1983; Wurmbrand, 2001; Broekhuis et al., 2015/2016; on Dutch and German), and there are dedicated studies on the earliest stages of the double modal construction in English and Dutch (Nagle, 1993; Ogura, 1993; Coupé, 2015; Coussé 2015). Nevertheless, our knowledge of the historical development of these constructions to their present-day distributions in the Germanic languages is limited. A parallel corpus, with contrastively as well as diachronically parallel material, would put us in the position of being able to track a construction through time and in different languages.

The corpus
As the source for a parallel corpus, the Bible is presumably unique as a a relatively stable collection of texts, available in many translations across languages and ages. Its division into books, chapters and verses is a well-established paratext, which greatly aids alignment of the parallel material. The sizes of the translations, typically ½–1 million words, also speak in favour of their use as the source of a linguistic corpus. Pilot explorations of a 1 million word corpus of contemporary, (professionally) written Dutch yield around 3000 attestations of three-verb complex verb constructions. We therefore need texts of this size to find attestations even for languages that are more restrictive with the construction. A methodological advantage of parallel material is that it provides negative evidence, when a passage does not contain the construction even though aligned passages do.
Our selection principles include a preference for prose translations that reflect the language of their time – this excludes translations that are completely in verse, archaic, or that prioritize source-language characteristics. To maximize parallelism, we avoid fragments, paraphrases and in-line commentaries. Finally, we prefer widely disseminated translations. Our current selection contains around 40 Bibles. We have selected Bible translations from the 14th century to the present, and included at least one version per century for each of the languages except Swedish, as fewer translations exist for the latter. Although there are existing parallel corpora based on modern Bible translations from different languages (e.g., Christodouloupoulos & Steedman, 2015), and parallel corpora with a diachronic dimension (e.g., Dipper & Schultz-Balluff, 2013), to our knowledge, ours will be the first corpus that systematically contains parallel texts in two dimensions.
The digital texts are collected from different sources: from existing corpora, earlier digitization projects, and through cooperation with Bible societies (see http://www.bijbelsdigitaal.nl, for one of the earlier projects we build upon). In addition, a small number of Bibles will be digitized from print as part of our project. Depending on the suppliers, some parts of the corpus will be available under an open license and other parts under a restricted, ‘login required’ license.

Acknowledgements
The project
The rise of complex verbs constructions in Germanic is funded by the Swedish Research Council (Ref.: 2017-01848; PI: Evie Coussé). For more information on the project, see https://complexverbconstructions.wordpress.com/.

Bibliography

Besten, H. den, and Edmondson, J. (1983). The verbal complex in continental west germanic. In Abraham, W. (ed),
On the syntax of the Westgermania. Benjamins, Amsterdam, pp 155–216.

Broekhuis, H., Corver, N., and Vos. R. (2015/2016).
Syntax of Dutch: Verbs and verb phrases. AUP, Amsterdam.

Christodouloupoulos C. and Steedman, M. (2015). Massively Parallel Corpus: The Bible in 100 languages.
Language Resources and Evaluation,
49(2): 375–395, 2015.

Coupé, G. (2015).
Syntactic extension – The historical development of Dutch verb clusters. PhD thesis, Radboud University Nijmegen.

Coussé, E. (2015). Constructional complexification. The rise of double modal constructions in Dutch.
Taal en Tongval,
67: 149–176.

Dipper, S. and Schultz-Balluff, S. (2013). The Anselm Corpus: Methods and perspectives of a parallel aligned corpus. In
Proceedings of the NODALIDA Workshop on Computational Historical Linguistics, Oslo.

Nagle S. (1993). Double modals in early English. In Aertsen, H. and Jeffers, R. (eds),
Historical linguistics 1989. Benjamins, Amsterdam, pp 363–370.

Ogura, M. (1993). Shal (not) mowe, or double auxiliary constructions in Middle English.
The review of English studies,
44: 539–548.

Wurmbrand, S. (2001).
Infinitives: Restructuring and Clause Structure. Mouton, Berlin.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019
"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO