Creating An EpiDoc Corpus for Ancient Sicily

paper, specified "long paper"
  1. 1. James Cummings

    Oxford University

  2. 2. Jonathan Prag

    Oxford University

  3. 3. James Chartrand

    Open Sky Solutions

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper introduces the TEI P5 XML – EpiDoc corpus of inscriptions on stone for ancient Sicily, I.Sicily. The project is one of the first attempts to generate a substantial regional corpus in EpiDoc. The project is confronting a number of challenges that may be of wider interest to the digital epigraphy community, including those of unique identifiers, linked data, museum collections, mapping, and data conversion and integration, and these will be briefly outlined in the paper which will concentrate on the conversion and technical development of the project.

Technical Background of I.Sicily
I.Sicily is an online, open access, digital corpus of the inscriptions on stone from ancient Sicily.
The corpus will be mounted at
by the time of DH2016, but is currently on a development server.
The corpus aims to include all texts inscribed on stone, in any language, between approximately the seventh century BC and the seventh century AD. The corpus currently contains records for over 2,500 texts, and when complete is likely to contain c. 4,000. The corpus is built upon a conversion from a legacy dataset of metadata in MS Access to EpiDoc TEI XML. The XML records are held in an eXist database for xQuery access, and additionally indexed for full-text search using SOLR/Lucene. The corpus and related information (museum list, bibliography) are published as Linked Data, and are manipulated through a RESTful API. The records are queried and viewed through a web interface built with AngularJS and jQuery javascript components. Mapping is provided in the browser by the Google Maps API, and ZPR (Zoom, Pan, Rotate) image- viewing is provided by the IIIP image server.

At the time of writing, the main conversion routine is being refined, and the epigraphic texts are being collated for incorporation into the records. An ancillary database of museum collections in Sicily has been constructed and bibliography is held in a Zotero library. Extensive search facilities will be provided, including map-based and bibliographic searching. Individual inscriptions and individual museums will both be provided with URIs, as will personal names and individuals; places will be referenced using Pleiades, epigraphic types, materials, and supports using the EAGLE vocabularies.

The motivations for I.Sicily
The existing epigraphic landscape in Sicily is extremely diverse in two primary regards: on the one hand, the island has a very mixed cultural and linguistic make-up, meaning that the epigraphic material is itself extremely varied, with extensive use throughout antiquity of both Greek and Latin, as well as Oscan, Punic, Sikel, and Hebrew
Recent overview of much of the linguistic tradition in Tribulato 2012; and of the epigraphic material in Gulletta 1999.; on the other hand, the publication of this material has a very uneven record and despite an excellent pre-twentieth century tradition, the existing corpora are far from complete and the ability of key journals such as SEG or AE to keep pace with local publication has been limited. A limited number of museum-based corpora have been published in recent decades (for Catania, Palermo, Messina, and Termini Imerese, as well as the material from Lipari), but this has not greatly improved the overall situation. The combination of these two factors already means that locating, identifying, or working with a Sicilian inscription, or its publication record, is extremely challenging for anyone without extensive experience of the material. I.Sicily has been conceived in the hope of improving the situation in all these areas.

Sicily is traditionally described as a ‘melting pot’, the ‘crossroads of the Mediterranean’. The situation created by basic technologies such as Unicode and TEI P5 EpiDoc XML mean that there is now no reason not to be language agnostic in the inclusion of material. The opportunities and possibilities offered by these technologies are considerable, since, for example, searching can be made language specific or language neutral. One obvious area where Sicilian studies are currently hampered by this disciplinary partitioning is in the study of onomastics. The Lexicon of Greek Personal Names records most instances of Greek names for the island, but Sicily is no less rich in non-Greek names (Latin and others), and at present there is no onomasticon for the island.

Simply by the marking-up and indexing of all names in the island’s inscriptions, I.Sicily will have generated a powerful tool for future study.

Identification and Bibliography
The PHI database of Greek inscriptions has a rich record of Greek texts, but again is text only and limited in outputs.

SEG references are available for 733 inscriptions on stone and AE references for 328 (data taken from the I.Sicily database and based upon comprehensive manual trawls of SEG and AE). One major aim of I.Sicily, therefore, is to generate unique identifiers for each inscription - the I.Sicily number, in the form ISic 1234 maintained as URIs, of the form:
. I.Sicily is well placed to do this since its initial dataset is primarily a bibliographic concordance of the lapidary inscriptions of Sicily. One of the associated outputs of the project will therefore be an online bibliography for Sicilian epigraphy, and an online Zotero library has already been created with over 700 records which are referenced in the EpiDoc. A locally cached version of the bibliography will be presented at the I.Sicily site to facilitate detailed bibliographic searching (including the identification of inscriptions by publication) and to allow the generation of customised concordances.

Location, location, location
I.Sicily is actively generating rich geo-data for the individual inscriptions, both for the original findspot/provenance and the current location (whether museum-based, on-site, or elsewhere), and we aim to provide map-based searching for inscriptions, as well as text-based searching by ancient and modern place-names. In addition to full listing wherever possible of both ancient and modern place names for epigraphic provenance, we are working to provide detailed location information for each find-spot and current location, through a combination of library and map-based research and the use of autopsy and GIS recording. At present geo-data is being recorded in two forms, both through the use of explicit geographical locations in the form of longitude and latitude records in decimal degree form, and through the use of Pleiades URI references wherever possible.

We are committed to the long-term use of Pleiades as our primary reference for ancient places, and to that end we aim to update and improve the Pleiades data for Sicilian locations, in particular name data and sub-locations, in conjunction with the editing of the I.Sicily records.

The creation and availability of translations is a major goal of the EAGLE project and its collaborators, and I.Sicily is no less committed to that ambition.
See Orlandi et al. 2014: Part II. Translations are very rarely available for any of the published Sicilian inscriptions. It is obvious that the inclusion of translations will make the material much more accessible to a wider audience both of students and the general public. Equally, provision of translations will add to the value of the database as a resource for museums and others curating the inscriptions recorded in the database. To that end, a long-term ambition of I.Sicily is to include translations wherever possible in both English and Italian. We see this as one obvious area where public contribution (‘crowd-sourcing’) will be invaluable.

Limitations and future ambitions
The scale of the enterprise, and the available resources, mean that in its current form the project has limited itself to inscriptions engraved on stone (the coverage of rupestral inscriptions/graffiti and of inscriptions painted on stone/plaster is regrettably uneven). However, there is no reason in principle not to extend coverage in future to include inscriptions on other materials. Similarly, although the current project does not include a programme to mark up linguistic features of the texts, the commitment to the long-term maintenance of the corpus and the open availability of the underlying XML records means that such a project would be entirely possible in the future.
It is our long-term ambition that I.Sicily might become the default location for the publication and dissemination of Sicilian inscriptions; in the shorter term, we hope that it will serve as valuable portal in the world of Sicilian epigraphy and of ancient world open linked data, greatly improving the accessibility of Sicilian epigraphy and so enriching the study of the ‘crossroads of the Mediterranean’.


Gulletta, M. I. (ed.) (1999),
Sicilia Epigraphica. Atti del convegno internazionale, Erice, 15-18 ottobre 1998, 2 vols.

Orlandi, S., Santucci, R., Casarosa, V. and Liuzzo, P. M. (eds.) (2014).
Information Technologies for Epigraphy and Cultural Heritage, Proceedings of the First EAGLE International Conference, Rome 2014, Sapienza Università Editrice. Published online at: (accessed 5 March 2016)

Prag, J. R. W. (2002). Epigraphy by numbers: Latin and the epigraphic culture in Sicily. In Cooley, A. E. (ed.), Becoming Roman, Writing Latin?: Literacy And Epigraphy In The Roman West (Journal of Roman Archaeology Supplementary Series),
Journal of Roman Archaeology, pp. 15-31.

Tribulato, O. (ed.) (2012).
Language and linguistic contact in ancient Sicily, Cambridge: Cambridge University Press.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website:

Series: ADHO (11)

Organizers: ADHO