Towards a Union Catalogue of XML-Encoded Manuscript Descriptions

Eric Haswell; Matthew J. Driscoll; Claire Warwick

Authorship

1. Eric Haswell

Humanities Computing & Media Centre - University of Victoria
2. Matthew J. Driscoll

The Arnamagnæan Institute - University of Copenhagen
3. Claire Warwick

School of Library, Archive and Information Studies - University of Sheffield

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The recent work of the European MASTER project (1999-2001) and the Text Encoding Initiative (TEI) (Burnard & Rahtz, 2005) in developing an XML
standard for encoding manuscript descriptions has
provided an opportunity to explore methods of making such
information more widely available through a web-based system. While traditional web database applications built on relational systems such as MySQL are well-suited for web delivery of data-centric, tabular information, they are
less useful for documents with an irregular and unpredictable structure. The emergence of native XML databases as a viable storage medium has made web delivery of irregularly-structured XML easier and more efficient.
The poster discusses a prototype system developed as part of a research project for the MA in Electronic Communication and Publishing at the School of Library, Archive and Information Studies, University College
London. The project, carried out at the University of
Copenhagen’s Arnamagnæan Institute, created a
earchable, web-based catalogue of descriptions of
Scandinavian manuscripts dating from the medieval
period. Using PHP and the eXist native XML database, a three-tier web database application was developed. Users of the system are provided with a facility for executing queries on the database through a web form which allows complex queries to be formulated involving many different
criteria. User input from a search form submission is
processed by PHP into an XQuery expression which is then passed to the database. Results are returned to the user after being processed by an XSLT engine on the
server. The web system is multi-lingual and places an emphasis on usability, standards-compliance and the use of open source software.
The realisation of this project is a significant first step towards the development of a comprehensive electronic tool for manuscript studies. In its current state, the system
demonstrates research opportunities not previously available with manuscripts from the Arnamagnæan
collection. The large number of possible search criteria, and
the ability to combine these criteria in complex ways, allows researchers to assemble datasets which may
otherwise have been difficult to gather. Given that there are clear limitations on the number of researchers able to physically view a manuscript due to constraints of time, funding and manuscript fragility, providing electronic
access benefits researchers significantly. There also exists the possibility that researchers may discover useful and interesting information which they had previously not even considered (Driscoll, 2002).
The prototype system demonstrates a method by which manuscript repositories may undertake similar projects involving XML-encoded source material. In addition, it shows how a standardised approach to XML encoding can facilitate the integration of records from disparate repositories into a single resource, thereby creating a
larger, more complete, and more useful catalogue.
Indeed, this type of union catalogue was one of the primary
goals of the MASTER project.
The goal of creating a unified catalogue of European
medieval manuscripts may demand some measure of a standardised approach to encoding, as the ability to
program query functionality is dependent on data that is structured in a similar manner across all documents being queried. Despite the availability in TEI P5 of a general tagset for encoding manuscript descriptions, the number of possible combinations of elements and different stylistic approaches to encoding present some obstacles to total
integration. Surmounting the challenges imposed by
encoding irregularities may be possible, however, through
the use of query techniques which accommodate these differences. More work is needed to assess the implications of such an approach. It is also uncertain at this stage how feasible it is to expect the system to smoothly scale
upwards as the number of documents increases. Initial indicators are positive, but a more rigorous case-study is required.
The development of a union catalogue is therefore
dependent on the availability of a technical infrastructure
of sufficient flexibility and reliability. The work done in developing the prototype, and the positive results from it thus far, suggest that this is indeed possible and within reach. The eXist XML database is a viable option for
storage and document management. XML documents need
only to be uploaded to the database in their complete form to be added to the collection, greatly simplifying management and allowing for participating repositories to be widely dispersed geographically. A fully-functioning
deployment would require a centralised server and some direct coordination of the system and its various
collections, but these are logistical matters that can be readily addressed with the provision of adequate funding and, more importantly, the enthusiastic participation of manuscript repositories.
The poster will comprise of a discussion of the web
resource, including examples of XML source material, the eXist database system, PHP code used to build the application, and the web interface. Further discussion will centre around the potential for involving other
manuscript repositories and issues raised in this regard.
Potential beneficiaries of this work might include those exploring methods of deploying XML-encoded material through a web interface, particularly if the material is not of the type that lends itself to incorporation into a relational database system, and those interested in the development of electronic tools for manuscript scholarship.
References
Burnard, Lou and Rahtz, Sebastian (2005). P5 Fascicule. [online]. Text Encoding Initiative. Available from: http://www.tei-c.org.uk/Activities/MS/FASC-ms.pdf [Accessed 27 March 2006].
Driscoll, Matthew (2002). “The MASTER Project:
Defining Standards for Electronic Manuscript Catalogue
Records”. In: Fellows-Jensen, Gillian and Springborg, Peter (eds): Care and Conservation of Manuscripts: Proceedings of the sixth international seminar held at the Royal Library, Copenhagen 19th-20th October 2000. Copenhagen: Royal Library. pp. 8-17.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

Towards a Union Catalogue of XML-Encoded Manuscript Descriptions

1. Eric Haswell

2. Matthew J. Driscoll

3. Claire Warwick

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006