BFM Collection - Open-Source Digital Editions of Medieval French Texts

paper, specified "short paper"
  1. 1. Alexei Lavrentiev

    ICAR Laboratory - Ecole Normale Supérieure de Lyon (ENS de Lyon)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction

The project of the BFM collection of digital edition was born as a result of over 20 years of developing a large corpus of medieval French texts for research purposes. This corpus called Base de français médiéval (BFM, currently includes over 130 texts, totalling approximately 4.7 million of words. The essential part of the corpus is composed of digitized paper scholarly editions selected for their philological quality. Digitizing paper editions was the only way to build a substantial corpus in a relatively short time and with limited funding. However there are serious copyright issues related to printed editions, as publishers tend to require exclusive rights on the books they print, and they are often reluctant to authorize digitization and re-use of the data in text corpora. Before 2000, publishing contracts rarely included explicit clause on digital distribution, so it can be argued that scholarly editors (or their heirs) still hold copyright for this medium, but more recent contracts include long lists of digital products and distribution modes. Even though possibilities for open-licensed publishing on the web exist, scholars have to give up all their rights if they want to publish their works in a prestigious collection recognized by the academic community. The BFM team aims at providing scholars with a possibility to publish medieval French texts under an open license (like CC BY-SA) in a collection with editorial quality guaranteed by the expertise of the reading committee including leading specialists in medieval French language and literature, in text editing techniques and in digital philology.
2. Editing principles

In addition to the open licensing, the BFM collection marks itself out by innovative editing principles. These principles have been elaborated in the project of the Queste del saint Graal digital edition1 and include a multi-layer transcription of primary sources (at least normalized and diplomatic), particular attention to punctuation and word segmentation, careful and clearly marked correction of scribal errors, linguistic annotation (part-of-speech and direct speech tagging). The "bedierist" method of the "best witness" is generally applied, but transcriptions of additional aligned witnesses are encouraged. Whenever possible, an edition should include a digital facsimile of the primary source which allows verifying the quality of the transcription. The presence of a modern French translation is optional but may be very useful to increase the range of potential readers and uses. All these principles are described in detail in the Introduction to the Queste del saint Graal edition and most of them were presented and discussed at the International Congress of Romance Linguistics and Philology (CILPR) in 20132.
3. Workflow and publication platform

At the first stages of the editing process text editors like Microsoft Word or Libre Office Writer may be used for the convenience of scholarly editors. A small number of special characters and character or paragraph styles are defined to facilitate future processing. For instance, a hash symbol before a letter indicates that a small letter from the primary source should be capitalized in the normalized transcription. Once the primary editing complete, the text is converted to XML-TEI, which is the pivot format for all markup and editorial products in the BFM collection.The BFM corpus preparation chain automatic tools for tokenization, morphosyntactic annotation and direct speech markup. Whenever possible, the morphosyntactic annotation is verified by experts, as the automatic tagging of Old French produces inevitably a certain number of errors due to the high level of orthographic and morphological variation.
The BFM web portal built on the TXM platform3 will be used to publish the collection on the web. The advantage of this portal is that it combines the possibility to render the edition in a convenient form for reading (including parallel browsing of multiple transcription layers, digital facsimile and translation) with powerful tools for qualitative and quantitative text analysis (including frequency lists, KWIC concordances, specificity, factorial analysis, etc.). The editions of the BFM collection will be included in the BFM main corpus, and the BFM registered users will benefit from additional services, such as creating a subcorpus or recording queries. However, the possibility to read the text and to download XML-TEI source files or a PDF printable version will be provided without registration requirement.
4. Current state of the project

The edition of the Queste del saint Graal is currently complete from the philological point of view (although additional manuscript transcriptions will probably be produced in the future). All the major components of this edition (multiple layer transcriptions, modern French translation, manuscript images, introduction, proper name index and glossary) are available on the BFM portal ( through a special "GRAAL" corpus. However, a more convenient interface for browsing the edition and for direct access to its components is still under development.
More editions are being prepared at a more or less advanced stage. These include the Psautier d'Arundel (edited by C. Pignatelli and A. Lavrentiev)4, the Vie de saint Alexis (edited by C. Marchello-Nizia and T. Rainsford) and the first French texts, Serments de Strasbourg and Séquence de sainte Eulalie (edited by C. Guillot, A. Lavrentiev, C. Marchello-Nizia and T. Rainsford). All these editions should be published in 2014.

1. Marchello-Nizia, Ch. and Lavrentiev, A. (ed.) (2009-2013). La queste del saint Graal. Édition numérique interactive du manuscrit Lyon (BM P.A. 77). Lyon: ENS de Lyon. (accessed 31 October 2013).
2. Guillot, C., Lavrentiev, A., Rainsford, T., Marchello-Nizia, C., Heiden, S. (2013). La "philologie numérique": tentative de définition d'un nouvel objet éditorial. Lemaréchal, A., Koch, P., Swiggers, P. (ed.). Actes du XXVIIe Congrès international de linguistique et de philologie romanes (Nancy, 15-20 juillet 2013). Section 13: Philologie textuelle et éditoriale. Nancy: ATILF.
3. Serge Heiden, (2010). The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, Institute for Digital Enhancement of Co
4. Pignatelli, C., Lavrentiev, A. (2013). Le Psautier d'Arundel : une nouvelle édition. Lemaréchal, A., Koch, P., Swiggers, P. (ed.). Actes du XXVIIe Congrès international de linguistique et de philologie romanes (Nancy, 15-20 juillet 2013). Section 13: Philologie textuelle et éditoriale. Nancy: ATILF.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO