French - University of Victoria
Humanities Computing & Media Centre - University of Victoria
Humanities Computing & Media Centre - University of Victoria
THE TEXTS
This image markup project fits into the larger
context of an electronic anthology, “Le marriage sous l’Ancien Régime: Une anthologie critique.” Since 1998, C. Carlin has been collecting texts about early modern marriage in France for her forthcoming book, L’imaginaire nuptial en France, 1545-1715. Given that the majority of documents studied for the book have not been republished since their original appearance in the sixteenth and seventeenth centuries, the idea of an
electronic anthology should be appealing to scholars in several disciplines (history, literary studies, linguistics,
cultural studies, art history, philosophy, religious
studies). The proposed anthology is discussed in the article
“Drawing Knowledge from Information: Early Modern Texts and Images on the TAPoR Platform” [1].
The radical changes undergone by the institution
of marriage in France during and after the Counter
Reformation generated texts of several different genres.
Included in the anthology will be medical, legal,
religious, satirical and literary documents and engravings,
all heavily annotated. It is the engravings that interest us for this presentation.
As part of a prototype for the anthology, several verse and prose polemics against marriage were encoded with XML markup in 2004 and early 2005. Most engravings of the period whose subject is marriage also fall into the polemical or satirical genre. Six from the collection of the Cabinet des Estampes of the Bibliothèque Nationale de France were requested on CD Rom, and are the test images for this project:
Jacques Lagniet, “Il cherche ce qu’il voudrait ne pas trouver”
Abraham Bosse, “La femme qui bât son mari”
Jean Le Pautre, “Corrige, si tu peux, par un discours honneste”
François Guérard, “Le Grand Bureau ou la confrèrie des martires”
Nicolas Guérard, “Présage malheureux”
Nicolas Guérard, “Argent fait tout” [2]
THE SCHEMA
Marking up annotations in XML required a
framework that allowed for both the text of the annotations, and the image areas to which they
correspond, to be encoded in a single document. Given that well-established tagsets exist for each of these
functions, an XML model was developed based on a
marriage of the Scalable Vector Graphics (SVG) 1.1
specification [3], and a subset of the Text Encoding Initiative
(TEI) P5 guidelines [4]. This union allows TEI and SVG markup to operate concurrently. The TEI markup forms the overarching structure of a document, while elements belonging to the SVG vocabulary may appear in specific locations within the TEI encoding.
Elements belonging to the SVG vocabulary are permitted within [div] tags and must be enclosed by the [svg] root element. Therefore, [svg] elements may appear anywhere with the TEI markup where [div] elements are permitted. Within an [svg] element may be any number of [rect] elements whose coordinates demarcate the area on an associated image to which an annotation applies. The texts of the annotations are enclosed in [div] blocks of their own, separate from the SVG encoding. This allows for annotation text to be encoded in any TEI-conformant way, presenting the possibility of integrating annotations
with larger corpora. A [div] element containing an
annotation text is associated explicitly with a set of
annotation coordinates through references to the
coordinates’ svg:id attribute.
Markup validity is enforced through XML Schema or RELAX NG schema files which are bundled with the application. The schema which validates the TEI portion
of the encoding has been generated by the ROMA suite of tools provided by the TEI for the purposes of
specifying and documenting a customization. The TEI schema is supplemented by the addition of a schema
describing the SVG 1.1 specification. The W3C provides the SVG 1.1 schema in either RELAX NG or DTD format, from which an XML Schema version may be derived using Trang [5]. Integrating schema from two different tagsets in this way is greatly facilitated by the modular construction inherent to both the TEI and SVG schema models. SVG may be ‘plugged-in’ to TEI by adding the [svg] root element to the list of allowable content in a particular context, and then associating the requisite schema documents with one another for the purposes of validation.
Taking this approach to schema marriage has several advantages. The TEI guidelines for textual encoding provide a tagset whose usage rules are well-defined and understood, facilitating the portability of the encoding between projects, and easing the integration of corpora from different sources. An earlier method of encoding image annotations in XML, Image Markup Language [6], is based on a standalone markup structure which does not offer the same high degree of interoperability as the current model. The TEI encourages customization of its guidelines to accommodate for a wide range of
implementations, an approach this project demonstrates.
More generally, working with XML allows for the
encoded material to be transformed into other formats
as requirements dictate, such as XHTML, PDF, or
OpenDocument format.
THE IMAGE MARKUP TOOL
Having decided on our approach to a schema, we then began to look at how we might create the markup. We wanted a straightforward tool for defining areas in an image and associating them with annotative markup, and we looked initially at two possible existing tools, INote [7] and the Edition Production Technology [8].
INote, from the University of Virginia, is a Java
application for annotating images. It does not appear to have been updated since 1998. In some ways, INote is an ideal tool; it is cross-platform (written in Java), and covers most of our requirements. However, we rejected INote for several reasons. The program can load only GIF and JPEG images, and we wanted to be able to handle other common image formats such as BMP and PNG.
INote also allows only limited zooming (actual size,
double size, and half size). We required more flexible zooming to handle larger images. Finally, INote uses a proprietary file format.
However, INote does allow for polygonal and elliptical annotation areas, something not yet implemented in our own tool.
The Edition Production Technology (EPT) platform is an Eclipse-based software suite developed by the
ARCHway project [9]. Its ImagText plugin allows the
association of rectangular areas of an image with sections
of transcribed text. Although it promises to be a very powerful tool, especially for the specific job of associating
document scans with transcription text, the interface of the program is complex and would be confusing for
novice users. In addition, the tool developers expect and encourage the use of customized DTDs (“We do not
provide support or guarantees for the DTDs included in the demo release - it is expected that users will provide their own DTDs and thus their own specific encoding practices.” [10]) The EPT also supports only JPEG, GIF, TIFF, and BMP files; other formats such as PNG are not supported ([http://rch01.rch.uky.edu/~ept/Tutorial/
preparing_files.htm#images]).
We therefore decided to write our own markup program, which is called the Image Markup Tool [11].
Fig 1: scrshot_main_1.jpg, avalable at [http://mustard.tapor.uvic.ca/~mholmes/image_markup/scrshot_main_1.jpg] At the time of writing, the program is in at the «alpha» stage, and the first public version will be released under an open-source licence in December 2005. The program is written in Borland Delphi 2005 for Windows 2000 / XP. Development of the tool is guided by the following requirements:
The Image Markup Tool should:
- be simple for novices to use
- load and display a wide variety of different image formats
- allow the user to specify arbitrary rectangles on the image, and associate them with annotations
- allow such rectangles to overlap if the user wishes
- provide mechanisms for bringing overlapped
rectangles to the front easily
- require no significant knowledge of XML or TEI
- allow the insertion of XML code if the user wishes
- save data in an XML file which conforms to a TEI P5-based schema with embedded SVG
- reload data from its own files
- come packaged with an installer, Help file, and basic tutorial
Using the Image Markup Tool, we have been able to perform several types of direct annotations, including the text within the engravings, commentary on that text, commentary on significant gestures depicted, and
information about the engraver and the seal of the library at the time the engraving entered the library’s collection. The tool allows for distinction among types of annotation,
and the use of a TEI-based file format allows us to link easily between the markup of the engravings the
TEI-encoded polemical texts which are also included in the collection.
We are now planning to use the program for a future
project which involves marking up scans of historical architectural plans. One of the aims of this project will be to make the plans available to the public, so that (for example) the current owners of heritage buildings will be able to do renovation and restoration work with more detailed knowledge of the original building plan. References
[1] Carlin, Claire. “Drawing Knowledge from Information:
Eary Modern Texts and Images on the TAPoR
Platform” in Working Papers from the First and Second Canadian Symposium on Text Analysis
Research (CaSTA), [http://www2.arts.ubc.ca/chwp/
Casta02/], and forthcoming in Text Technology.
[2] Bibliothèque nationale (France). Département des estampes et de la photographie and Roger-Armand Weigert, ed. Inventaire du fonds français, graveurs du XVIIe siècle. Paris: Bibliothèque Nationale, 1939.
[3] Ferraiolo, Jon, et al., eds. (2003). Scalable Vector Graphics (SVG) 1.1 Specification. World Wide Web Consortium. [http://www.w3.org/TR/SVG/]
[Accessed 03-11-2005].
[4] Sperberg-McQueen, C. M., and Lou Burnard, eds.
(2005). The TEI Guidelines: Guidelines for Electronic
Text Encoding and Interchange - P5. The TEI
Consortium. [http://www.tei-c.org/P5/Guidelines/index.html] [Accessed 03-11-2005].
[5] Thai Open Source Software Center. (2003). Trang - Multi-format schema converter based on RELAX NG. [http://www.thaiopensource.com/
relaxng/trang.html] [Accessed 03-11-2005].
[6] Image Markup Language 1.0. (2000). [http://
faculty.washington.edu/lober/iml/] [Accessed
03-11-2005].
[7] _INote_. Intitute for Advanced Technology in the Humanities, University of Virginia, 1998. [http://www.iath.virginia.edu/inote/] [Accessed 07-11-2005]
[8] _Edition Production Technology (EPT build 20050301). Kiernan, Kevin et al., University of Kentucky. 2005. [http://rch01.rch.uky.edu/~ept/download/] [Accessed 07-11-2005]
[9] Kiernan, Kevin, et al. The ARCHway Project. University of Kentucky. [http://beowulf.engl.uky.edu/~kiernan/ARCHway/entrance.htm] [Accessed 07-11-2005]
[10] «Tutorial for EPT Demonstration.» [http://rch01.rch.uky.edu/~ept/Tutorial/demo_tagging.htm]
[Accessed 07-11-2005]
[11] Image Markup Tool. Carlin, Claire, Eric Haswell and Martin Holmes. University of Victoria
Humanities Computing and Media Centre. 2005. [http://mustard.tapor.uvic.ca/~mholmes/image_markup/]
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)
Paris, France
July 5, 2006 - July 9, 2006
151 works by 245 authors indexed
The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.
Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/