The Open Philology Project at the University of Leipzig

poster / demo / art installation
Work text
The Open Philology Project (OPP) at the University of Leipzig aspires to re-assert the value of philology in its broadest sense and has been designed with the hope that it can contribute to any historical language that survives within the human record. It includes three different yet interdependent tasks:

(1) Open Greek and Latin Project (OGL): OGL is currently collecting and scanning editions of classical texts in an effort to build the largest and most comprehensive open-source library of classical philology to date, concurrently contributing to the expansion of Google Books. Where existing corpora of Greek and Latin have generally included one edition of a work, the OGL corpus is designed to manage multiple, copyright-free editions and translations.

The digitization workflow involves OCR, correction and encoding in EpiDoc-compliant XML. The large volume of data we aim to generate requires significant computational power and task management, thus entreating a partnership with two Data Entry companies who carry out each operation under the supervision of the Leipzig team. While performed by our contractors, OCR correction is facilitated and partly automated thanks to a proofreading tool jointly developed by Leipzig, Mount Allison University and the CNR (Bruce Robertson of Mount Allison University, Canada, and Federico Boschetti of the CNR, Italy) Works currently under conversion include, amongst others, the Patrologia Latina, the Patrologia Graeca, the Commentaria in Aristotelem Graeca.

Moreover, Leipzig has established international collaborations aiming at creating open-source, curated collections and electronic editions of Greek and Latin literature. Editorial projects include the Digital Fragmenta Historicorum Graecorumproject, Digital Athenaeus, and Bibliotheca Aeschylea. Furthermore, collaborations with Croatia, Bulgaria and Georgia will yield machine-actionable versions of translations of classical literature in these languages, thus opening-up research into less-explored textual heritage.

(2) Historical Languages e-Learning Project (eLP): the development of dynamic textbooks that use richly annotated corpora to teach the vocabulary and grammar of texts that learners have chosen to read, and at the same time engage users in collaboratively producing new annotated data. eLPis developing computationally customized learning materials for historical languages, beginning with Ancient Greek. The text selected for the pilot is the Pentecontaetia, part of Thucydides' History of the Peloponnesian War. Users learn through active engagement with the text and through the contribution of their own annotations. Future work will extend the system to accommodate other corpora.

At the core of eLP lies increasing the accessibility and enjoyability of the morphosyntactic and semantic annotation of text (e.g. treebanking), including that deriving from the OGL corpus. The creation of such a richly annotated and searchable text repository will serve a variety of purposes, including research in philology, Natural Language Processing (NLP), historical linguistics, and second language acquisition (SLA).

The production of automated queries to support this dynamic, customized, and localized interface relies upon the backend storage of complex textual data. The chosen graph model meets the broad requirements of the e-Learning application while retaining features of the real world objects represented by the data. The absence of schemas within graph databases enables extensibility, while maintaining a stable experience for users through the use of REST APIs.3 The web interface takes the data and adapts its presentation to individual needs and access devices. HTML5, CSS3, and responsive technologies provide an appropriate experience to users regardless of how they access the system, while templating systems allow for resources that are structurally accessible via any first language.

(3) Open Publications and Data Revenue Models: OPP is establishing a new model of scholarly publication in a born digital environment. Such a task is accomplished through Perseids, which is a collaborative platform for annotating TEI XML documents in Classics, including inscriptions and manuscripts. The main publication model within the OPP is the Leipzig Open Fragmentary Texts Series, whose goal is to establish open editions of ancient works that survive through quotations and re-uses in later texts. Such editions are fundamentally hypertexts and the effort is to produce a dynamic infrastructure for a full representation of relationships between sources, quotations, and annotations about them.

With open data meaning by definition free access for all users, the OPP team has already begun thinking of ways for it to be financially sustainable for years to come. The team intends to devise business models to sustain and maintain distributed open source learning and discourse. The core principle is to move away from charging for monopoly access to data, to charging instead for services that allow users to identify, analyze and then contribute to increasingly complex open data, with services for faculties, students and for the interested public set at recognized and affordable price points.

