Quantifying the Unknown: How Many Manuscripts of Sévigné Still Exist?

paper, specified "short paper"
  1. 1. Simon Gabay

    Université de Neuchâtel

  2. 2. Lucie Rondeau du Noyer

    Lycée Descartes

  3. 3. Matthias Gille Levenson

    Ecole Normale Supérieure de Lyon (ENS de Lyon)

  4. 4. Ljudmila Petkovic

    Université de Neuchâtel

  5. 5. Alexandre Bartz

    École Nationale des Chartes - Université PSL

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Manuscripts can be burned, lost, forgotten, thrown away... If scholars have already tried to measure the proportion that has survived since the apparition of moveable types with Gutenberg [Weitzman, 1987], such percentages do not help editors of texts to answer a more practical question: how many documents of a given author still exist, and among them how many are accessible to scholars?In the present paper, we want to use Madame de Sévigné (1626-†1696) as a test case to calculate how many autograph manuscripts (AM) are still circulating on the market, and therefore assess precisely what is inaccessible because it is held in private collections by combining three different sources of information. First, a list of French AM held in libraries – which has been created for the occasion, because there is no catalogue for French literature such as the one by P. Beal [2005]. Second, a list of Sévigné’s AM held in historical private collections, drawn from Duchêne’s edition [Sévigné, 1972-1978]. Third, a list of manuscripts drawn from fixed-price and auction catalogues [Bodin, 2000], which contains the description of hundreds of thousands of manuscripts sold over decades (fig. 1).1. QuantificationExisting manuscripts (E) of an author (a) are either kept accessible in libraries and archives (L), either (more or less) hidden in private collections (C). The problem is that while we know L, for which we have catalogues, C is unknown. To know what it represents, we can divide it in two: on the one hand we have historical collections (H), usually inherited by old families, well documented and extremely static, and on the other hand there is an unknown amount of documents circulating between private collectors (P). If we cannot know P, we can use a proxy: we can deduce what is still on the market (M) by subtracting what is owned (because it has been bought) by libraries (L) from everything that has been sold (S). Because buyers are constantly intervening on the market (S), any value is only true at time t – i.e. the date of the last sale catalogue taken into account, assuming that all the previous ones have been analysed. With all these information, we can now deduce how many manuscripts still exist (E) if we know S. Figure 1: RDA, May 1894 (N°166)Looking for French AM, we have concentrated our efforts on documents sold in Paris, and for financial and time reasons, we have focused on catalogues published before c. 1900. We have retro-converted:250 fixed-price catalogues of the Revue des autographes (RDA, cf. fig. 1).300 fixed-price catalogues of the Lettres autographes et documents (LAD).100 auction catalogues.Because of similarities between such catalogues and dictionaries, we have been able to use GROBID dictionaries [Khemakhem et al., 2018] to process the images and transform them into a fully TEI-conformant semantic encoding (fig. 2) thanks to a custom workflow [Gabay et al., 2020].Figure 2: XML-TEI encoding of an entryThe workflow keeps undergoing constant improvements (e.g. Rondeau Du Noyer et al. [2019]), which have led to the creation of a dedicated tool for catalogues [Khemakhem et al., 2020]. In its last version, on top of traditional features for information extraction (special characters, position on the page... in red in fig. 3), we now use typographical information (bold, italics, size of the font... in blue in fig. 3) for more precise results. Figure 3: RDA, May 1873 (N°37)3 Annotation and reconciliation The letter previously mentioned is not the only one of Sévigné sold during the 19th c., and it has not been sold only once Figure 4: RDA, July 1897 (N°200)Figure 5: RDA, April 1902 (N°257)Because the same item can be sold multiple times, it is crucial to transform the list obtained with the digitisation of sale catalogues into a set of unique types (or classes, cf. blue and red boxes in fig. 6), prior to comparing these types with existing documents held in libraries. Doing so, we can identify AM that have never appeared on the market (in pink and in black), document the history of those that are now in library collections (in blue and in orange) or identify “ghost” manuscripts that are still circulating on the private market (in green and red). Figure 6: Reconciliation-identification processTo carry out this task, more information is required than those provided by GROBID dictionaries, we have therefore added an extra layer of information, including the type of document (L.a.s. for autograph letter signed, D.s. signed document...), its length (number of pages or folios...), its format (in-octavo, in-quarto...), its date or its price. Since these information follow either an extremely strict (1 p., L.a.s.... ), either a fairly common pattern (12 janvier 1798, 19 sep. 1820...), they are tagged with regexes and dedicated python libraries in order to obtain a more fine-grained encoding: Figure 7: Annotated <desc>Combined with the name of the author, such information provide a unique combination of features that can be used to compare sold documents over time, and identify not only same AM sold twice, but also different fragments of a single manuscript (tab. 1), which share part of the information only (same date, same format but different length). Table 1: Key information of three sold items from cataloguesBecause we have catalogued all the known manuscripts of the marquise de Sévigné after extensive research in European and American libraries, it is possible to reconstitute part of their history thanks to the sale catalogues. Table 2: Key information of two manuscripts ResultsWe can now offer some results:63 sales have been identified until 190346 AM being sold at least one time, 14 at least two times13 letters out of the 46 sold are not in public libraries or archivesFollowing these numbers, we can say that:c. 1% of the 1,350 letters identified by Duchêne [Sévigné, 1972-1978] are still circulating on the market.c. 5% of the total has survived but is inaccessible to scholars, if we add the 62 letters still held in the private collections of the Guitaut family in Burgundy.Such numbers, obviously, need to be taken with caution for two main reasons. On the one hand, the oldest catalogues are not precise enough to identify exactly which AM is sold. On the other hand, the market in the 19th c. is already international, and manuscripts sold outside of France are not taken into account by our study. This second problem should receive all our attention in a near future to contribute to the history of objects [Courtin, 2020], and especially the migration of manuscripts [Burrows et al., 2019].Acknowledgements Many thanks to Agathe Decaster for her (crucial) help with the mathematical formulas, and her brother Erwan.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO