Building Web Applications That Integrate TEI XML and Relational data: xMod and rdb2java

Paul Spence

Authorship

1. Paul Spence

King's College London

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Both the relational database model and the document-focused XML model are regular staples in humanities
computing projects, and at the Centre for Computing
in the Humanities, King’s College London, the two technologies have been used extensively across a large number of the thirty-plus projects that we have been
involved in over the last few years. But whether for
practical reasons (limited resources) or for more
‘ideological’ reasons (the conviction that one or other technology provides superior modelling capabilities), it seems as if relational database and document-centric
XML-based approaches are often each viewed as
alternatives that almost entirely preclude the use of the other.
Since both technologies tend to involve quite distinct views of how the core data may be represented, it is perhaps not surprising that a strategic choice is usually made to use one or the other across a given project, and where they do co-exist, it is often the case that development cycles are largely autonomous, with the only true points of contact being the initial strategy stage and the final presentation of the integrated data.
This was certainly the case in a number of projects
involving CCH until recently, where having decided early on that the core materials required either a fundamentally
‘data-centered’ or a ‘document-centered’ approach, and then finding that the other was also necessary to represent a subset of the data, we faced significant challenges in bridging the two.
Our extensive use of TEI XML- to mark up anything
from classical inscriptions; to medieval charters; to
musicological bibliographies; to born digital ‘presentational’
websites,- has led to the creation of ‘xMod’, a highly
modular application which can transform a heterogeneous
repository of TEI XML documents into a completely finished website. Similarly, our wealth of experience in
modelling humanities data (in particular prosopographical
data) using relational databases has resulted in the
development of an equally modular application called ‘rdb2java’, which facilitates the connection of a database
to the web, simplifying greatly the creation of data
queries, the updating of data and its final presentation. Both
applications operate on common principles of separation
of concerns (separating functionality from design),
assist development (via the creation of a base layer of programming logic that can be easily extended) and aim to use standards-based approaches as far as possible.
In spite of the similarity in the basic objectives and conceptual approaches behind each application, the fact is that in the first two projects where both tools were
deployed,1 the process of integration happened very late in the day, at the presentational stage. Since we have found time and time again that it is inappropriate to try to shoe-horn the entire dataset for a project into a
single technology, and are facing increasing requests for
features that require data to be shared between the two applications in increasingly rich and complex ways, it seemed logical to explore points of connection in a much broader sense.
One way of re-examining the parallel processes that
we followed in the afore-mentioned projects was to
appropriate the model proposed by Jesse James Garrett in
his well-known ‘Elements of user experience’ diagram.2 According to this model, the development of websites crosses five planes (strategy, scope, structure, skeleton
and surface), with a dissecting line cutting down
through each so that we may compare and contrast the nature of each plane according to the type of technological approach taken (‘Web as software interface’ or ‘Web as hypertext system’).
Although the model does not match our situation
perfectly, we actually found the comparison extremely useful. As in the Garrett diagram, the bottom and top layers were the two parts of our development process that were integrated most closely. At the bottom (‘strategy’)
level, project objectives were, of course, set globally,
and user needs were easily assessed independently
of technological approach taken. Similarly, at the top (‘surface’) layer, the visual design of each component
had to be co-ordinated in such a way that an integrated
digital publication was produced. Here there had to be some consistency between the two areas of the site, and although there were some obvious presentational
differences stemming from the particular nature of each technology, it had to seem to the user as if they both
formed part of a seamless whole as far as was possible.
However, in the intermediate three stages, there were some key differences. These particularly interested us, because having decided to investigate the extent to which there could be greater integration between our database and TEI XML applications, we started to ask ourselves to what degree the differing underlying models proposed by each technology would be an insurmountable obstacle in developing an integrated application that was not only seamless on the outside, but also on the inside.
Following Garrett’s model, the ‘web as software’ path starts at the ‘scope’ level by defining the ‘feature set’ for a given set of data. The next stages are to create a structure that governs how the system responds to the user, and then to build an interface that manages these different interactions. The primary focus is on ‘tasks’.
Meanwhile, the ‘web as hypertext system’ model begins with an analysis of how the content is to be arranged, translates this into a structural arrangement of content elements and then adds the navigational design necessary to negotiate the information structure. The primary concern
here is ‘information’, and there is often a strong bias
towards the ‘document’ view of data.
Leaving aside the web publication challenges, and
focusing instead on the analytical paths and processes that
each technology encourages, we find further differences. The process of marking up a text using TEI XML often involves adding structure to texts which already exist, thereby placing significant emphasis on the archival
integrity of the source ‘text’, whereas in the relational database view of data, a given structure is modelled first and then populated with data. In this sense, TEI XML is often ‘reporting on’ or ‘describing’ data, whereas relational databases model, aggregate and order abstract data.
In textual markup projects, most of the decisions about technical approach and data structure are taken near the beginning of the life of the project, at the stage of
document analysis. This is less true of humanities-focused
database projects however- for while the process does start with some intense initial analysis, the perspective can change considerably as the project progresses since the cognitive process takes place as data is added to the project, necessitating changes to the database structure
and making it more difficult to visualise the data
presentation in the early stages.
Translating this to the experience of our two applications, the document-focus of xMod means that there is a much closer correlation between data representation (TEI XML
markup) and its presentation, making it far easier to
produce initial output using this tool than is the case with rdb2java, where more work needs to be done to display the data.
However, when it comes to searching/indexing, we find that the hierarchical ‘tree-view’ model proposed by XML has a strong effect on the kinds of query that it is easy to carry out, and the relative performance of each,
whereas relational data can be re-ordered and queried in more flexible ways. At one point, native XML databases seemed to provide the means for complex XML structures to be queried in an efficient manner, but in our experience, they have not so far lived up to their expectations and still fall short in terms of efficiency and scalability.
In this paper we will outline plans to build an integrated
application that deals with many of these challenges, and
in doing so, reconciles as far as possible the different
technological perspectives that each brings to this
unlikely association. At a more basic level, this includes
unified design principles managed from a single point, with common CSS/XHTML components, shared
libraries of styles, shared overall wireframe design and where appropriate, a single means of maintaining the central navigation system for a given project.
We will also describe some of the more complex
associations possible when data is connected, shared or ‘piped’ between the two types of application, and explore generic ways in which we can facilitate the transmission of data to and from each in a manner that will, moreover, facilitate wider interoperability with other projects using different technological approaches.
Finally, we will discuss the ramifications of a more
integrated database-TEI XML strategy for the overall project development process, with an outline of some of the technical strategies that might facilitate common
development.
Footnotes
1 These were CCEDb and PASE: The Clergy of the Church of England Database, King’s College
London, the University of Kent at Canterbury and the University of Reading. Accessed 2005-11-11. http://www.ccedb.org.uk/. The Prosopography of Anglo-Saxon England. King’s College London and Cambridge University. Accessed 2005-11-11. <http://www.pase.ac.uk/>
2 The Elements of User Experience, Jesse James Garrett website. Accessed 2005-11-11. <http://www.jjg.net/elements/pdf/elements.pdf>
References
CCH projects page. Centre for Computing in the
Humanities. Accessed 2005-11-11. <http://www.cch.kcl.ac.uk/projects/>
xMod, a TEI-based publishing application. Centre for Computing in the Humanities. Accessed 2005-11-11. <http://www.cch.kcl.ac.uk/xmod/>
Garrett, Jesse James (2002). The elements of user
experience: user-centered design for the web. New York: AIGA/New Riders.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

Building Web Applications That Integrate TEI XML and Relational data: xMod and rdb2java

1. Paul Spence

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006