A Bilingual Digital Edition of Trinity College Cambridge MS O.1.77.

poster / demo / art installation
  1. 1. Alpo Honkapohja

    University of Helsinki

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

A Bilingual Digital
Edition of Trinity College
Cambridge MS O.1.77.
Honkapohja, Alpo
University of Helsinki
The poster will present my work-in-progress
PhD project of a 15
-century bilingual medical
manuscript, containing Latin and Middle
English. The edition is designed with the needs
of historical linguistics in mind, and will have
some corpus functionalities. My long term
aim is to use it as a pilot study of sorts in
contrastive investigation of Latin and Middle
English medical writing.
1. Background
Medieval medical writing for a long period
of time received fairly little attention. For
instance, Robbins described it, in 1970, as a
“Yukon territory crying out for exploration”.
In the 1990s and 2000s, the situation has
changed, and the field is becoming filled with
tiny flags stating the claims of various research
projects and individual scholars. There are now
large electronic corpora such as the
English Medical Texts
(MEMT), published
2005, and
A Corpus of Middle English Scientific
, currently being compiled in collaboration
between the University of Malaga and Hunter
Library in Glasgow.
These resources do, however, have one inherent
bias. They focus on Middle English material,
which gives a distorted view of the linguistic
situation in England in the late Middle Ages.
England, after the Norman conquest, was a
trilingual society in which educated members
of the society were likely to have at least
some degree of literacy in Latin, Anglo-Norman
French as well as English. This shows, for
instance, in the fact that manuscripts containing
texts in more than one language outnumber
monolingual ones. (cf. Voigts 1989). Moreover,
marginal comments also suggest they had
a readership competent in more than one
My PhD project is intended as the first
genuinely bilingual online resource of medical
manuscripts in late Medieval England, and will
hopefully pave the way for similar resources
in the future. It is designed for both historical
linguists and historians, but paying special
attention to the needs of linguistics.
2. Trinity College Cambridge, MS
Trinity MS O.1.77. is a pocket-sized (75 x
100 mm) medical handbook, located in Trinity
College Cambridge. It contains 10 to 18 texts on
medicine, astrology and alchemy. It is usually
treated as a sibling MS of the so-called Sloane-
group of Middle English manuscripts, which is
a group of late Latin, English and French MSS
originating from London or Westminster in the
late Middle English period (cf. e.g. Voigts 1990).
James assigns MS Trinity O.1.77 an exact date
1460, based on astrological markings in the final
flyleaf (1902), although it may not be entirely
accurate. (see Honkapohja 2010, forthcoming)
Roughly 4/5 of the manuscript is in Latin and
1/5 in English, that is, out of slightly less
than 30,000 words, c. 24,000 words are Latin
and 5,500 in English. There does not appear
to be a clear-cut division between prestigious
Latin texts and more popular English ones.
Latin, however, is used almost exclusively for
metatextual functions such as incipits and
explicits. Nearly all marginal comments in the
manuscript are in Latin.
3. The digital edition
The digital edition which I am preparing will
be designed in such a way that it will function
as reliable data for historical linguistics. This
involves encoding a sufficient amount of detail
on linguistic variants without normalising,
modernising, or emending the data, and keeping
all editorial interference transparent (see e.g.
Kytö, M., Grund P. and Walker T. 2007 or Lass
On the technical side, I am using TEI P5 –
conformant XML tagging built on stand-off
architecture. Things included in the base-level
annotation are a graphemic transcription of
the text (cf. e.g. Fenton & Duggan 2006),
select manuscript features such as layout, and

information about the manuscript and hand.
Each word will also be tagged with a normalised
form, useful for linguistic research, and an ID
which allows the addition of additional tagging
by means of stand-off annotation – including,
for instance, POS tagging, semantic annotation
or lemmatisation.
The edition will have an online user interface,
which will allow the user to select the level of
detail he or she wishes. It will be possible to
use it with either normalised text or diplomatic
transcription. It will be released under a Creative
Commons license. The user will have full
access to the XML-code, including all levels of
annotation, and will be allowed to download and
modify it for non-commercial purposes.
The development of the edition will take place
in collaboration between the Digital Editions for
Corpus Linguistics (DECL) project based at the
University of Helsinki.
The DECL project was started by three post-
graduate students in 2007. It aims to create
a framework for producing online editions
of historical manuscripts suitable for both
corpus linguistic and historical research. DECL
editions use a more strictly defined subset of
the TEI-guidelines and are designed especially
to meet the needs of corpus linguistics. The
framework consists of encoding guidelines
compliant with TEI XML P5. The aims of the
project are presented in more detail in our article
(Honkapohja, Kaislaniemi & Marttila 2009).
5. Digital Edition of O.1.77 as
a resource for the study of
My PhD project has both short and long term
goals related to the study of multilingualism.
The short term aim is to design the edition
in a way that is of maximum use for scholars
working with medical texts and especially
multilingualism. I am especially putting a lot
of effort into interoperability and making the
encoding as flexible as possible.
Hypothetical research questions for the edition
will include, for instance:
Spelling variation.
Using the edition will enable
getting information on spelling variation in
English and Latin, in order to see whether
the accepted general view that Latin was more
regular is supported by quantitative data.
The use of brevigraphs and contracted forms.
Manuscript abbreviations are an extremely
common feature in the Latin texts of
the manuscript. They are also applied in
the Middle English sections, but with less
frequency. The edition will make it possible
to obtain exact statistical information on
which manuscript abbreviations carry into the
vernacular, and with how much variation and
Syntactic complexity:
Do sentences in Latin
contain a greater number of sub clauses
and other signs of syntactic complexity than
Middle English ones?
Textual Functions:
The use of English
and Latin in various text types, recipes,
metatextual passages (in which Latin very
much dominates). The type of structural
and background information which is being
annotated in the edition will enable the user to
perform the searches on different level textual
passages, including marginal comments and
metatextual passages.
After the completion of the PhD project,
the edition will be expanded with other
related multilingual medical and alchemical
manuscripts in the Sloane group, which will
increase the usefulness of the database, by
allowing, for instance, comparative study of the
same text in different manuscripts. I am also
planning to make use of the available corpora on
Middle English medical writing for comparisons
to Middle English.
A Corpus of Middle English Scientific
(accessed 13 March 2010).
Fenton, E. G., Duggan, H. N.
'Effective Methods of Producing Machine-
readable Text from Manuscript and Print
Electronic Textual Editing.
L., O’Brien O’Keeffe, K., Unsworth, J. (eds.).
New York: MLA.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2010
"Cultural expression, old and new"

Hosted at King's College London

London, England, United Kingdom

July 7, 2010 - July 10, 2010

142 works by 295 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: http://dh2010.cch.kcl.ac.uk/

Series: ADHO (5)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None