CatCor: Correspondence of Catherine the Great
The CatCor pilot project has produced a searchable online text collection of the letters of Catherine the Great. This poster examines the technical background to the project and specifically how the document analysis and subsequent creation of a TEI P5 customization that tightly controlled the more general TEI P5 Guidelines produced a number of significant benefits for the project. These mainly relate both to the ease of checking of the letters and also the subsequent development of a website extracting data from this text collection. This digital humanities project is used as a case study to demonstrate how, with only very minimal funding, a research project may be empowered to produce resources that are of clear and immediate benefit to socio-cultural historians and literary scholars interested in these materials.
About Catherine the Great's letters
Catherine the Great used her correspondence, the primary knowledge-transfer medium of the Age of Enlightenment, to shape her nation's and her own role in the political, cultural, and social arenas of Europe. The collection includes epistolary exchanges with such major figures as Voltaire, Frederick the Great, Friedrich Melchior Grimm, and Catherine’s charismatic lover Grigory Potemkin. Catherine ruled Russia for 34 years in the second half of the 18th century (from 1762-1796) and her correspondence was the essential medium through which she governed her court and empire, and established her own and her nation’s formidable presence in the political life of Europe and in the intellectual life of the European Enlightenment. Catherine saw herself as the model Enlightenment monarch, ruling over a nation that had fully opened itself to European influence and entered the European consciousness barely 50 years earlier. The primary aim of her reign was to demonstrate, through both policy initiatives and public relations efforts, that Russia was a European state.
The digital collection
Creating a digital collection of this unique set of letters allows scholars to discover unexpected links between letters that have previously been difficult, if not almost impossible, to juxtapose. The use of digital humanities methodologies has meant this project has been able not only to bring together these disparate and difficult to access texts, but also enabled new ways of searching, browsing, and comparing them. While the pilot project only uses 100 letters these were written by Catherine during two key periods in her transformative reign, 1774 and 1790-91. Later funding bids will aim to expand the collection and the project's scope to encompass a complete collection of Catherine's few thousand extant letters and also include those sent to her.
The text of the letters used for the pilot project are encoded in TEI P5 XML and translations to English are provided to facilitate use by a wider range of researchers. This markup also allows the letters to be classified according to a controlled vocabulary of project-specific themes which are then exposed to users as facets in the web interface. The project supplements the letters with a new apparatus of editorial notes and metadata concerning every single person, place, event, and work mentioned in the correspondence. The ease of extraction and aggregation of these named entity instances is one example of the benefits of using a well-known open international standard and having undertaken the necessary document analysis to significantly constrain this schema. Once extracted the instances along with accompanying metadata are able to be displayed, browsed, and filtered with common technologies such as jQuery DataTables. Similarly the use of schematron constraints in the TEI ODD Customization, and the development of straightforward mechanisms for simplifying proofing of common aspects of textual collections is documented in this poster.
The digital humanities aspects of this project are used as a case study into the re-usability of the general purpose tools developed for processing, displaying, and checking basic named entities as well as other common features of such letters. All of the relevant digital technologies, including the TEI P5 XML files of the letters, metadata files, TEI ODD Customization, and XSLT2 functions used for creation and proofing of the underlying data files are available from a public github repository under an open license. It is hoped that this will encourage re-use and empowerment for other digital humanists wishing to make similar text collections available. The open source development work on this project demonstrates the kinds of sophisticated results, immediately beneficial to the scholars working on these materials, that can be achieved with limited resources.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne
July 7, 2014 - July 12, 2014
377 works by 898 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)
Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/
Attendance: 750 delegates according to Nyhan 2016
Series: ADHO (9)