De Montfort University, Centre for Textual Studies - Tuscia University
The purpose of this poster presentation is to illustrate the implementation of the main functions of a digital library using open source tools and avoiding proprietary and commercial tools. The choice of using proprietary software as DynaText/DynaWeb was, until few years ago, almost forced, but it has the risk of bringing to a blind alley for many reasons, not least the economic one. The decision to utilize XML instead of SGML for defining a mark up language for the digital text encoding, as the T.E.I. Consortium did, was the first step which permitted to have more possibilities. The more XML data is being used in the IT world, the more open source software XML-enabled is being developed. With some adaptations to its particular aims, Humanites Computing can use the same tools which are used in hard-core and business IT applications. A lot of free editors for XML and XSLT editing are available, and also there is plenty of choice for the parsers and the processors. But encoding a text, or a collection of texts, in a mark up language and validate it is only the first step in the creation of a digital library and in some way is not the most difficult one. On the internet there is now a lot of documentation and examples about it, thank also to the diffusion of the T.E.I. encoding as the de facto standard in this field, but the situation is not that easy when you have to deliver the text(s) in some electronic format to the readers.
Even restricting the many functions a digital library should have, the main two are:
a) publishing the collection, possibly in multiple formats, in order to have suitable outputs for the different hardware/software devices ( desktop, laptop, pda, ebook reader)
b) querying the collection for full-text researches.
Then the problem is to find the right open source products and adapt them for implementing the digital library modeldescribed above. Fortunately enough a kind of natural selection is happening in the Humanities Computing world, and the choice is always more and more oriented towards the same tools, so to have many projects facing the same problems, allowing to concentrate the efforts of the many teams, all over the world.
About the publishing issue, the application always more and more adopted is Cocoon, part of the Apache Project. Slightly more than an advanced servlet in version 1.0, Cocoon has quickly evolved to a very good web publishing framework in 2.0, to become a complete XML application server in 2.1, with the possibility of extending its components. At a basic level, Cocoon is able to capture the client web requests, understand which resource is needed, fetching the corresponding data (which could be contained in an XML file or in a RDBMS), apply a transformation on this data ( using an XSLT stylesheet for example) and deliver the result to the client. Naturally this mechanism is very suitable for the dynamic publishing of an electronic text encoded with the T.E.I. markup, thanks also to the XSLT stylesheets for the HTML output developed by Sebastian Rahtz and made freely available. Aim of this presentation is to illustrate some of possibilities you can obtain using Cocoon to produce multiple versions of a text, content-scalable and device-scalable, starting from an XML/T.E.I. document.
Apart from the publishing issue, the querying one is made possible using a native XML database, a kind of database which stores, queries and retrieves directly the XML documents, not forcing and encapsulating them in the table model of the RDBMS. eXist is probably the more known and efficient open source native XML database and for the queries support the XPath language with some added functions and in the latest version some of the XQuery functions. eXist and Cocoon could be integrated, so to store the XML resource needed by Cocoon into eXist and not in the filesystem, so to have always the same XML documents used both for the publishing and for the query avoiding eventual redundancy problems. Naturally the result of the queries made to eXist is in XML so it could be passed through an XSLT stylesheet to decide in which output format should be delivered to the client.
This presentation consist in a case study of implementing a digital library using only open source software, showing to the audience a real world solution for the main problems.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Göteborg University (Gothenburg)
Gothenborg, Sweden
June 11, 2004 - June 16, 2004
105 works by 152 authors indexed
Conference website: http://web.archive.org/web/20040815075341/http://www.hum.gu.se/allcach2004/