The Middle English Grammar Corpus - a tool for studying the writing and speech systems of medieval English

Martti Mäkinen

Authorship

1. Martti Mäkinen

University of Stavanger

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Middle English Grammar Project
The Middle English Grammar Project (MEG), shared by the
Universities of Glasgow and Stavanger, is working towards
the description of Middle English orthography, morphology
and phonology. MEG is among the fi rst attempts to span the
gap between Jordan’s Handbuch der mittelenglischen Grammatik:
Lautlehre (1925) and now. Our aim is to combine the advances
in Middle English dialectology in the latter half of the 20th
century and the computing power currently available in the
service of writing an up-to-date grammar of Middle English.
Middle English dialects and
dialectology
The study of Middle English dialects took a giant leap
forward by Angus McIntosh’s insight that Middle English texts
represent distinct regional varieties in their spelling systems,
and therefore the spelling variants of these texts could be
studied in their own right and not merely as refl ections of
the then speech systems, i.e. dialects (McIntosh 1963). This
multitude of regional spellings arose when English had been
replaced by French and Latin in all important aspects for
nearly two centuries after the Norman Conquest: the reintroduction
of English into literary and utilitarian registers
from the thirteenth century onwards was not governed by any
nationwide standard and thus English was written according
to each scribe’s perception of the ‘correct’ spelling. McIntosh’s
vision led to a project that grew into A Linguistic Atlas of Late
Mediaeval English (LALME; 1986).
Aims of MEG
The Middle English Grammar project builds on the work of
the LALME team and aims at producing a description of Middle
English orthography, phonology and morphology, from 1100 to
1500. Here we use the term grammar in a wide, philological
sense. Eventually, the grammar is meant as an replacement
to Richard Jordan’s Handbuch der mittelenglischen Grammatik:
Lautlehre, and to provide a reference point for the students and
scholars of Middle English in the form of a broad description
of the Middle English usages accompanied by more specifi c
county studies, and eventually also all the base material we
accumulate for this task (Black, Horobin, and Smith 2002: 13). The Middle English Grammar: How?
The fi rst task of MEG is to compile a corpus of Middle English
texts localized in LALME. The corpus is called The Middle
English Grammar Corpus, or MEG-C (the fi rst installment
forthcoming 2007). Secondly, the corpus texts need appropriate
lemmatization and annotation in order to be usable in the
course of MEG.
Linguistic data is collected by transcribing extracts from either
the original manuscripts, or good-quality microfi lms. The
prioritised material are the texts that were localized in LALME,
although later texts that were not analysed for LALME will be
taken into account as well. LALME covers years 1350-1450
(-1500); the material for the studies in 1100-1350 will be
drawn from A Linguistic Atlas for Early Middle English (LAEME)
(Laing and Lass, forthcoming 2007).
The manuscript texts are represented by 3,000-word extracts
(or in toto, if shorter), which should be suffi ciently for studies
on orthography, phonology and morphology. The planned
corpus will sample c. 1,000 texts, therefore the projected size
of the corpus is 2.5-3 M words.
The conventions of transcription have been derived from
those of the LAEME and A Linguistic Atlas of Older Scots projects
(LAOS), with certain modifi cations. The most important
questions that have been addressed during the transcription
process have been whether to emphasise fi delity to the
original vs. wieldy transcripts, and should the transcripts offer
an interpretative reading of the manuscript text rather than
the scribe’s actual pen strokes. According to the principles
chosen, the transcriptions attempt to capture the graphemic
and broad graphetic details, but not necessarily each detail on
the level of individual handwriting (Black, Horobin, and Smith
2002: 11).
MEG-C: lemmatization, annotation,
publication
The second practical task is to lemmatize and to annotate the
Corpus. Previous historical English corpora (Helsinki Corpus,
Middle English Medical Texts) show the limitations the lack of
lemmas set to the corpus user when tackling the variety of
spellings attested to by Middle English texts. The lemmas in
MEG-C will have an Oxford English Dictionary headword. There
will also be another cue in the source language (the direct
source languge before Middle English, usually Old English,
French/Anglo-Norman or Latin). These two reference points
on either side of Middle English will provide the user the means
to search for occurrences of a lexical item even when the full
range of spelling variation in Middle English is not known.
As regards the annotation of words of a text, they are divided
into bound morphemes and other spelling units (this system
is partly derived from Venezky (1970)). Each word is divided
into a word initial sequence containing Onset and Nucleus,
and they are followed by a series of Consonantal and Vowel
Spelling Units. Each spelling unit is also given the equivalents in
the source language and in Present Day English, thus enabling
the search for e.g. all the ME refl exes of OE [a:] or or the
spelling variants in Middle English that correspond to Present
Day English word initial spelling sh-.
For the task of annotation and lemmatization the corpus is
rendered into a relational database. The database plan has
tables for different extralinguistic information, and the actual
texts will be entered word by word, i.e. in the table for corpus
texts, there will be one record for each word. The annotation
plan we are intending to carry out should result in a corpus
where one can search for any combination of extralinguistic
factors and spelling units with reference points embedded in
the actual Middle English texts and also in the source language
and PDE spelling conventions.
The fi rst installment of MEG-C will be published in 2007,
containing roughly 30 per cent of the texts in the planned
corpus in ASCII format. It will be on the Internet, accessible
for anyone to use and download. Our aim with publication
is two-fold: fi rstly, we will welcome feedback of any kind, and
especially from scholars who know the texts well; secondly, we
want to encourage and to see other scholars use the corpus.
References
Black, Merja, Simon Horobin and Jeremy Smith, 2002.
‘Towards a new history of Middle English spelling.’ In P. J.
Lucas and A.M. Lucas (eds), Middle English from Tongue to Text.
Frankfurt am Main: Peter Lang, 9-20.
Helsinki Corpus = The Helsinki Corpus of English Texts (1991).
Department of English, University of Helsinki. Compiled
by Matti Rissanen (Project leader), Merja Kytö (Project
secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English);
Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu
Nevalainen, Helena Raumolin-Brunberg (Early Modern
English).
Horobin, Simon and Jeremy Smith, 1999. ‘A Database of
Middle English Spelling.’ Literary and Linguistic Computing 14:
359-73.
Jordan, Richard, 1925. Handbuch der mittelenglischen
Grammatik. Heidelberg: Winter’s Universitätsbuchhandlung.
LAEME = Laing, Margaret, and Lass, Roger, forthcoming
2007. A Linguistic Atlas of Early Middle English. University of
Edinburgh. Nov. 22nd, 2007. http://www.lel.ed.ac.uk/ihd/
laeme/laeme.html
Laing, M. (ed.) 1989. Middle English Dialectology: essays on some
principles and problems by Angus McIntosh, M.L. Samuels and
Margaret Laing. Aberdeen: Aberdeen University Press. LALME = McIntosh, M.L., Samuels, M.L. and Benskin, M.
(eds.) 1986. A Linguistic Atlas of Late Mediaeval English. 4 vols.
Aberdeen: Aberdeen University Press. (with the assistance of
M. Laing and K. Williamson).
LAOS = Williamson, Keith, forthcoming 2007. A Linguistic
Atlas of Older Scots. University of Edinburgh. Nov. 22nd, 2007.
http://www.lel.ed.ac.uk/research/ihd/laos/laos.html
McIntosh, A. 1963 [1989]. ‘A new approach to Middle English
dialectology’. English Studies 44: 1-11. repr. Laing, M. (ed.) 1989:
22-31.
Middle English Medical Texts = Taavitsainen, Irma, Pahta, Päivi
and Mäkinen, Martti (compilers) 2005. Middle English Medical
Texts. CD-ROM. Amsterdam: John Benjamins.
Stenroos, Merja, 2004. ‘Regional dialects and spelling
conventions in Late Middle English: searches for (th) in the
LALME data.’ In M. Dossena and R. Lass (eds), Methods and
data in English historical dialectology. Frankfurt am Main: Peter
Lang: 257-85.
Stenroos, Merja, forthcoming 2007. ‘Sampling and annotation
in the Middle English Grammar Project.’ In Meurman-Solin,
Anneli and Arja Nurmi (eds) Annotating Variation and Change
(Studies in Variation, Contacts and Change in English 1).
Research Unit for Variation, Change and Contacts in English,
University of Helsinki. http://www.helsinki.fi /varieng/journal/
index.html
Venezky, R., 1970. The Structure of English Orthography. The
Hague/Paris: Mouton.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2008

Hosted at University of Oulu

Oulu, Finland

June 25, 2008 - June 29, 2008

135 works by 231 authors indexed

Conference website: http://www.ekl.oulu.fi/dh2008/

Series: ADHO (3)

Organizers: ADHO

The Middle English Grammar Corpus - a tool for studying the writing and speech systems of medieval English

1. Martti Mäkinen

ADHO - 2008