The ARCHway Software Infrastructure: a platform and utilities for building electronic editions

Jerzy W. Jaromczyk; Neil Moore

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In this paper we will discuss the implementation of EPT's architecture, specifically focusing on those utilities that form its underlying skeleton or infrastructure. These utilities support the consistent management of diverse sources of data while providing an extensible framework for building and organizing the editorial tools discussed in the previous papers.

The EPT's architecture is based on the plugin, an encapsulated and independent software unit that “plugs into” a larger whole, extending its functionality. An editorial workbench built from many individual tools gives the editor the freedom to pick and choose tools for the tasks at hand. We selected Eclipse as the platform for both the development and deployment of the EPT because it seemed it would well support such a free and configurable approach to the design of the editorial workbench. In this presentation we will briefly describe the Eclipse platform and the functionality and implementation of a selection of EPT plugins that provide the infrastructure for accessing data, organizing and annotating resources and defining different models for electronic editions.

Eclipse is an open-source platform designed to serve as an extensible integrated development environment (Eclipse Platform Technical Overview.). Originally developed by IBM, Eclipse is now maintained by the Eclipse Foundation and an extensive user’s community. Although initially intended for software development, Eclipse's open-ended plugin-based architecture allows users to extend it to support an unlimited variety of other tasks including software deployment.

The organization of Eclipse is a collection of plugins: loosely-coupled software components, often developed independently of one another, which communicate with each other and with which users can interact using well-defined interfaces. Much like the more familiar web browser plugins (such as the Macromedia Flash plugin, Sun’s Java plugin, and Adobe’s Acrobat Reader plugin), Eclipse plugins hook into so-called extension points defined elsewhere in the application, extending and enhancing the application’s existing functionality as well as adding completely new features. However, Eclipse differs notably from most other extensible software. Most significantly, the platform itself is built from scores of plugins which themselves extend other plugins and provide additional extension points; almost everything in Eclipse is a plugin. This may be contrasted with, for example, web browsers, where plugins extend an underlying monolithic software system and rarely interact with one another.

The extensibility of the plugin architecture will be a tremendous benefit to the users of the EPT. Humanities researchers have various needs and editorial styles; with a plugin system, they can create a personal editing workbench containing only the tools they need, without losing the advantages of a coherent interface and uniform access to data. Furthermore, scholars with specific needs unforeseen by the EPT developers may collaborate with programmers to develop their own editing tools or modify existing ones. The EPT's plugin architecture allows users to develop new tools separately from and independent of the EPT and then plug them into the EPT extension points, providing a seamlessly integrated experience. In effect, such plugins become an integral part of a customized version of EPT, on equal footing with the many tools that make up the EPT proper.

The EPT organizes its plugins in a series of layers, with each layer building upon, using, and extending the layers below. We will discuss three of these layers in our presentation. The bottommost layer is called the Data Layer. The plugins making up the Data Layer provide a consistent set of operations for managing, reading, and storing various types of edition data files such as images, configuration files, textual transcripts, marked-up edition documents, and XML document type declarations (DTDs). The Data Layer provides a single interface for accessing all data, regardless of where that data is stored — in the local file system, a database (see Dekhtyar et al.), or a remote site. Plugins called data source drivers extend the Data Layer by providing functionality to access resources through different means. Currently the EPT contains two data source drivers, one for accessing files located within the file system of the computer running the EPT, and one for accessing resources stored on a remote web server, using the HTTP protocol for the World Wide Web. The flexibility of the Data Layer framework allows the user to implement a wide variety of data source drivers; users could build drivers that transparently compress or encrypt data, drivers to maintain data in a relational database, and many others.

On top of the Data Layer sits the Project Explorer, which provides a higher-level view of the resources comprising an electronic edition project. Project Explorer provides a user interface for viewing and managing the logical structure — the model — of a project (as opposed to its physical structure, as an edition project may take its components from several different data repositories). Project Explorer provides a rich set of extension points so that other plugins may contribute resources to the Explorer view. Such contributions provide various actions (e.g., launch a tool, display an image, etc.), which operate on the model, and on the resources themselves.

The Resource Registry, built atop the Data Layer and Project Explorer, enables the user to organize, categorize, and manage collections of similar resources. For example, one collection might contain all the manuscript images comprising the electronic edition. Each collection has a schema, a list of attributes applicable to all resources in the collection. The editor defines collections and their schemas in the Resource Registry, and then adds resources to those collections, describing them by specifying the attribute values for each resource. For example, the schema for manuscript images might contain attributes describing the folio name (038v, for example); the image format (JPEG, GIF, TIFF, etc.); the type of lighting used when digitizing the image (e.g., overhead white light, ultraviolet light, or fiber-optic backlight); the provenance of the image files; and so forth. The Resource Registry contributes items to the Project Explorer, arranging the resources by a user-defined ordering of their attributes. Other plugins can issue queries to the Resource Registry asking for resources with certain attributes, for example all the ultraviolet manuscript images which are in the JPEG format.

The utilities described above form part of the infrastructure for the EPT. They provide a workbench within which a user can arrange various specialized tools, such as ones described in the other papers in this session, in convenient combinations capable of solving complex tasks in the production and presentation of image-based electronic editions.

Bibliography

Dekhtyar, Alex, et al.
Database Support for Image-based Electronic Editions
Proceedings, 10th International Workshop on Multimedia Information Systems (MIS 2004), August 25–27, 2004, College Park, MD
2004
147-156

Eclipse

Eclipse Platform, Technical Overview

Jaromczyk, Jerzy W.
Bodapati, Sandeep
An Architecture Promoting Collaborative Research, Teaching and Learning
Proceedings, Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, May 29–June 2, 2003, Athens, GA
10
2003

Jaromczyk, Jerzy W.
Moore, Neil
Geometric data structures for multihierarchical XML tagging of manuscripts
Proceedings, 20th European Workshop on Computational Geometry, Seville, Spain, March 2004
2004

Kiernan, Kevin
Jaromczyk, Jerzy W.
Dekhtyar, Alex
Porter, Dorothy Carr
Hawley, Kenneth
Bodapati, Sandeep
Ionut, Emil Iacob
The ARCHway Project: Architecture for Research in Computing for Humanities through Research, Teaching, and Learning
Literary and Linguistic Computing
Forthcoming in 2005(Special issue, papers from Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, 2003.)

Full text license: This text is republished here with permission from the original rights holder.

The ARCHway Software Infrastructure: a platform and utilities for building electronic editions

1. Jerzy W. Jaromczyk

2. Neil Moore

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2005