University of Ulster
University of Ulster
University College Cork
University College Cork
University College Cork
Introduction
Ireland has a 1400-year history of writing in the vernacular and its writers have produced a vast body of native literature in the various languages used in Ireland spanning most of that period. The bulk of the literature was written in Irish, which includes Old Irish (600–900), Middle Irish (900–1200); Early Modern Irish (1200–1650) and Modern Irish (1650–present). This session draws together the work of four closely-related projects dealing with the digitisation and hyperlinking of materials relating to medieval Irish literature.
CELT (Corpus of Electronic Texts), which is based in University College Cork, is an ISO standard corpus of multilingual texts of Irish literature and history. CELT's mission is to bring the wealth of Irish literary and historical culture in the relevant languages to the Internet in scholarly editions. Over the last ten years, CELT has published a substantial number of SGML-encoded scholarly editions of Irish texts in Irish, Latin and Norman French. Currently it has over 5.5 million words of text online, many of these with deep-level markup, and it is continuing to make more sources electronically available. The text header details the text history, lists editorial conventions, and usually gives details of manuscript sources and an up-to-date bibliography.
More recently, the Centre for Irish and Celtic Studies at the University of Ulster has begun the task of producing an electronic Dictionary of the Irish Language (eDIL) based mainly on Old and Middle Irish (c.700-c.1200 AD). The Dictionary of the Irish Language is the standard resource for Old, Middle and Early Modern Irish lexicography and the only source that covers the wide-ranging language changes within Irish, but it is notoriously difficult for students and non-specialists to use. eDIL will be published in CD-ROM format and will alleviate many of these problems through powerful search mechanisms. The text is being marked up in XML following TEI guidelines, and is due to be completed in three years.
The potential for collaboration between these two projects was obvious and a third project aimed at fulfilling this potential is beginning in 2003-04. It is entitled Linking Dictionaries and Texts (LDT) and will create remote electronic links between eDIL and CELT's corpus of on-line texts. Finally, Julianne Nyhan, a Ph.D. student in University College, Cork, is preparing a lexicon of early Irish that will be published online on the CELT website. It opens up the possibility of providing a quick reference dictionary for the available electronic texts in Old and Middle Irish, and will also provide a useful intermediate link between CELT's corpus and eDIL.
Topic and Organisation
The main aim of the panel is to examine the benefits, modes and problems of linking electronic dictionaries and corpus texts. The panel will also discuss the approaches taken to digitisation and the solutions adopted to peculiar problems associated with minority, non-standard languages.
Five contributors will present four papers focussing on different aspects of the topic. By drawing together these papers into a single session we hope to present a comprehensive picture of the state of computing in early Irish language and literature, and to explore the technological and organisational hurdles that have to be overcome to seamlessly link originally separate electronic resources to the benefit of both the projects and the users.
All contributors have consented to present a paper and join the panel.
Chair: Gregory Toner (Director of the eDIL Project).
Beatrix Faerber (CELT Project Manager) will provide an introduction to CELT, the nature of the texts being digitised and the methods adopted for encoding. CELT uses SGML/XML markup for structural and analytic features in accordance with the recommendations of the TEI (www.tei-c.org). Some of the markup is customised in line with the specific needs of Irish language sources, which feature especially prominently in the corpus, such as the annals. These have deep-level markup of dates, personal names, organisation names, social roles and termini technici. The online search engine enables enhanced interrogation of texts: names, dates, places and events can be identified on a context-sensitive basis over the full range of the Corpus.
Maxim Fomin (eDIL Assistant Editor) will discuss some of the problems encountered in the digitisation of eDIL and the solutions adopted. He will pay particular attention to the problems of digitising minority, non-standard languages. Issues raised will include quality assurance of highly-inflected, non-standard language; management of deep-level markup and the potential and pitfalls of automatic tagging; and the establishment of a lexical view of the contents of the dictionary while preserving the original format.
Gregory Toner and Beatrix Faerber will outline the nature of the collaboration between eDIL and CELT, with particular focus on the nature of the links required. eDIL is a scholarly dictionary containing thousands of citations from Old and Middle Irish texts, many of which are only available in the best research libraries or in the original manuscripts. A central aim of the Linking Dictionaries and Texts Project (LDT) is to automatically link these citations in eDIL to the appropriate document in the CELT corpus of texts so that users of eDIL can view the citations in their full context. The editors of eDIL are using XML to mark up the text of the dictionary, with the result that robust links can be established between it and documents on the Web. However, although remote queries are already feasible, identifying which file to examine is more difficult to automate. Linking references permanently by HTML/XML is possible if the location of the document sought is known. This requires a referencing mechanism, providing a similar functionality to traditional referential systems. Peter Flynn (Head of the Electronic Publishing Unit of UCC) will outline how we intend to use JavaScript and XSLT to link citations in eDIL to the original texts in the CELT corpus.
Julianne Nyhan (Doctoral researcher, UCC) and Peter Flynn will report on their progress towards using XSLT to generate links between on-line texts and a digital lexicon of variants of early Irish currently being compiled. Old and Middle Irish present peculiar difficulties because the language is heavily inflected and the orthography irregular, so simple generation of links between a text and a dictionary are not possible. Nor is it practical to tag all words with the correct lemma because of the vast extent of the corpus. The solution proposed here is that the lexicon will contain a full range of inflected forms with due account being given to variant forms, and that XSLT will be used to generate links between texts and the lexicon. While linking in HTML is the fundamental principle of the Web, the ability of XSLT to preprocess text on the server means that the technicalities of linking need no longer intrude on the user's interface. Significantly, more work can be done in finding, arranging, and analysing the relevant text without the need for the user to have special knowledge of the technology. In particular, using server-side XSLT makes it possible to find, extract, and present information from multiple XML documents, as well as from multiple locations within the same document, without having to involve restrictive proprietary technologies which might create barriers to the use of the system. The TEI document types, which are used throughout, provide a stable and reliable format for storing this kind of information. It is hoped that this development will enable users of the CELT corpus to follow automatically generated links to the appropriate entry in the on-line lexicon and eventually to the full electronic Dictionary (eDIL) when it becomes available.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Göteborg University (Gothenburg)
Gothenborg, Sweden
June 11, 2004 - June 16, 2004
105 works by 152 authors indexed
Conference website: http://web.archive.org/web/20040815075341/http://www.hum.gu.se/allcach2004/