Interedition: Principles, Practice and Products of an Open Collaborative Development Model for Digital Scholarly Editions

paper
Authorship
  1. 1. Joris Job Van Zundert

    Huygens Institute for the History of the Netherlands (Huygens ING) - Royal Netherlands Academy of Arts and Sciences (KNAW)

  2. 2. Gregor Middell

    Lehrstuhl für Computerphilologie - Julius-Maximilians Universität Würzburg (Julius Maximilian University of Wurzburg)

  3. 3. Dirk Van Hulle

    Center for Manuscript Genetics - Universität Antwerpen (University of Antwerp)

  4. 4. Tara Lee Andrews

    Department of Greek Studies - Katholieke Universiteit (KU) Leuven (Catholic University of Louvain)

  5. 5. Ronald Haentjens Dekker

    Huygens Institute for the History of the Netherlands (Huygens ING) - Royal Netherlands Academy of Arts and Sciences (KNAW)

  6. 6. Vincent Neyt

    Center for Manuscript Genetics - Universität Antwerpen (University of Antwerp)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Interedition: Principles, Practice and Products of an Open Collaborative Development Model for Digital Scholarly Editions
van Zundert, Joris , joris.van.zundert@huygensinstituut.knaw.nl Huygens ING - Royal Netherlands Academy of Arts and Sciences,
Middell, Gregor, gregor.middell@uni-wuerzburg.de Universität Würzburg, Lehrstuhl für Computerphilologie,
Van Hulle, Dirk , dirk.vanhulle@ua.ac.be University of Antwerp, Centre for Manuscript Genetics,
Andrews, Tara L., tara.andrews@arts.kuleuven.be Katholieke Universiteit Leuven, dept. of Greek Studies,
Haentjens Dekker, Ronald , ronald.dekker@huygensinstituut.knaw.nl Huygens ING - Royal Netherlands Academy of Arts and Sciences,
Neyt, Vincent, vincent.neyt@ua.ac.be University of Antwerp, Centre for Manuscript Genetics,
Short Paper Abstract
In October 2006 a small group of developers of tools for digital textual scholarship, gathered under the leadership of the Huygens ING, concluded that there was an urgent need for a more collaborative approach to digital tool development in the humanities. Several problems plagued the field: high duplication of effort, a shortage of quality software development, poor exchange of development and methodological knowledge, institutionalized development, and consequent problems of non-sustainability and obsolescence. In 2008 this initiative became a formal European funded project, COST Action IS0704 'Interedition'. Since then, Interedition has been improving cooperation and fostering interoperability in tool development for digital textual scholarship. Van Zundert, J. et al.: Memorandum of Understanding (MoU) for the implementation of a European Concerted Research Action designated as COST Action IS0704: An interoperable supranational infrastructure for digital editions (Interedition). Brussels, Belgium, 2007. http://snipurl.com/1dl4v1, accessed 30 October 2010. This paper will reflect on the first results that have emerged meanwhile.

We have identified four significant obstacles to the widespread use of digital tools by humanities scholars. First, applicability of tools to humanities research is often lacking: in many cases the tools are not tools researchers need. Second, development capacity within the humanities is extremely limited compared to scientific fields. Third, the sustainability of these tools, in terms of their ongoing support and maintenance requirements, is historically very low. Finally and perhaps most critically, tool and infrastructure availability and development is highly institutionalized: scholars outside large research projects or institutions often find themselves unable to take advantage of the development work that might otherwise benefit them substantially.

The researchers and developers in the Interedition project have come up with an approach to the problems of applicability, availability, and sustainability that revolves around the concept of ‘microservices’. A microservice is, ideally, a very small application whose functionality is available over the Web by means of a lightweight protocol (e.g. REST). Microservices can be used programmatically in conjunction with other such services, in multiplicative combinations, to answer the individual need of any research project. Microservices share a set of deceptively simple principles: they are cheap and fast to develop; cheaper and easy to maintain; address very specific needs (that are shared between many researchers); implement simple protocols that are easier to reuse and exchange; can be combined to create larger workflows useful to the individual scholar. This idea is not new: already service-based architectures such as SEASR/MEANDRE, and commercial services such as Yahoo! Pipes, feature a ‘modular’ model, and such architectures are increasingly considered to be the best approach for software development within humanities research.Küster, M. W., Ludwig, C., and Aschenbrenner, A.: 'TextGrid as a digital ecosystem', IEEE DEST 2007, 21.-23. Cairns, Australia, 2007. SEARS/MEANDRE: http://seasr.org/meandre/documentation/architecture, accessed 30 October 2010; http://seasr.org, accessed 30 October 2010. Yahoo! Pipes: http://pipes.yahoo.com, accessed 30 October 2010. However, given the variety of contexts and environments in humanities research, it is critical that tools not be tied to a single unifying infrastructure, which is a drawback of systems such as SEASR or Yahoo! Pipes. If we are to make maximally efficient use of humanities development capacity, any scholar or developer should be able to contribute his or her work to other projects with a minimum of effort; this means that we cannot insist upon a standardized infrastructure, platform, computing language, or even data format. Moreover, there is no guarantee that any such single infrastructure will be indefinitely available; the infrastructure thus becomes a single point of failure.

Over the past two years, Interedition has been putting its theory to the test by exploring effective development methods for tools for text criticism and literary analysis, using the very limited resources available through the COST framework. The result is a open-source development ‘collective’ cooperating in ‘bootcamps’. Cooperation does not require everyone to have the same research goal, nor need they agree on the 'right' way to approach textual scholarship. The resulting tools are furthermore not tied to or centralized in any single institution. The combined efforts of the participants become a collection of microservices, varying widely in implementation language, platform, hosting provision, etc., according to the resources and expertise available to the individual participant. The only technical requirements placed on any microservice are that it be web-accessible to other services using a REST-like protocol, and that information on its input requirements and output results be available via an HTTP GET request.http://gregor.middell.net/collatex/api/collate, accessed 30 October 2010 Thus, even with a wide variety of implementation practice, small microservices can be built up to perform tasks in a larger workflow for ‘real’ and varied research purposes.

Current ‘live’ microservices are open source, which is crucial both to the proposed model of cooperation for research and development, to intellectual and scientific transparency, and to sustainability. Many of them are hosted on free cloud computing infrastructures (e.g. Google App Engine or Herokuhttp://code.google.com/appengine/, accessed 30 October 2010; http://heroku.com/, accessed 30 October 2010.) for increased availability and reliability.

As a proof of principle Interedition chose to implement as a microservice architecture a tool commonly wanted for textual scholarship—collation of text witnesses—and worked to design a set of web-based services that could improve upon existing technologies. Designing collation software for textual editors that would substantially improve upon existing tools like the NINES project’s JUXTAJuxta. Collation software for scholars: http://www.juxtasoftware.org/, accessed 31 October 2010. and Peter Robinson’s COLLATERobinson, P.: ‘Collate: A Program for Interactive Collation of Large Textual Traditions’, in N. Ide and S. Hockey (eds.), Research in Humanities Computing. Oxford, 1994, pp. 32–45. has been a significant challenge, and one that has allowed the team to thoroughly test their development model and architectural approach with an important problem which, if solved, would immediately benefit a broad community of scholars. The result is CollateXhttp://collatex.sourceforge.net, accessed 30 October 2010, developed as a technical and methodological successor to COLLATE, which reached the end of its supported life around 2007. We will describe CollateX’s decentralized development within the Open Source community and how ‘bootcamps‘ brought together an international team of developers and domain experts for requirements analysis, coordination of development efforts and collaborative work on the code base. We will also explain how Interedition’s design principles were implemented by splitting the collation process into functional tasks, each of which can be implemented with a variety of small and well-defined software services; these services are then loosely coupled to produce “the simplest solution that could possibly work” for the different use cases to be addressed. Three clearly separate functional tasks—input tokenization, the alignment itself, and analysis or visualization of the results—were identified and implemented as microservices.

This ‘microservice model’, as exemplified by CollateX, has another important advantage: it fits well with current directions of thought on the form and function of future digital scholarly editions. Rather than simple and static online republication of books, it is likely that future editions will be dynamic open-ended research environmentsBoot, P. and Van Zundert, J.: “The Digital Edition 2.0 and The Digital Library: Services, not resources.” In: Knoche, M., Mittler, E. et al. (eds): Bibliothek und Wissenschaft. Wiesbaden, Germany. (Forthcoming); Robinson, P.: http://computerphilologie.uni-muenchen.de/jg03/robinson.html, accessed 30 October 2010. composed of smaller scholarly components from many different sources. It is an ultimate goal of Interedition to leverage this microservice architecture to facilitate such ‘distributed editions’.

Poster Abstract
This poster presentation will demonstrate in detail the technical aspects of Interedition’s ‘microservices model’ for interoperability. The poster will serve as the technical annex to the short paper on Interedition and its development principles, and will describe in detail the working of the various components that make up CollateX, Interedition’s foremost proof-of-concept implementation. At the poster presentation CollateX will be running on a laptop to demonstrate its capabilities.

We will also showcase some digital humanities projects that are benefiting already from Interedition’s approach to tool building. Examples include TILE and T-PENN, both of which focus on the common task of transcribing images of text and aligning this transcription with the associated regions in the image. When their commonalities are examined, a similar set of core services might be deduced and a robust set of modular services designed to improve upon the advances these tools have already made. We will take a more in depth look at the Beckett Digital Manuscript Project on which the Centre for Manuscript Genetics (University of Antwerp) is collaborating with the Huygens ING. This project raises the question within the framework of Interedition of how the architecture of a digital archive containing modern manuscripts can be designed in such a way that users can autonomously collate textual units of their choice with the help of Interedition’s collation web service and thus decide for themselves how this digital architecture functions – as an archive, as a genetic dossier, or as an edition.

As a consequence of developments in digital scholarly editing, the strict boundary between digital archives and electronic editions is becoming increasingly permeable, resulting in a continuum rather than a dichotomy. Usually, archives are distinguished from editions because the latter offer a critical apparatus. Of all the interoperable tools developed within the Interedition framework, the collation module has the special merit that it can enable any user to transform a digital archive into an electronic edition.

From the vantage point of editorial theory, this development has interesting consequences regarding the scholarly editor’s role, whose focus may shift from the collation to a more interpretive function. In this way, the integration of a collation tool may be consequential in terms of bridging the gap between genetic criticism and textual scholarship. From the perspective of editorial practice, the application of CollateX is still at an experimental stage, but it already shows that the modular approach used by Interedition has the potential to be useful both to the specialized field of digital scholarly editing and to a more general audience.

References:
Atkins, D. et al Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure 2003 NSF

Cohen, D. et al. Tools for Data-Driven Scholarship: Past, Present, Future. A Report on the Workshop 22-24 October, 2008, Turf Valley Resort, Ellicott City, Maryland March 2009 (link)

European Union Riding the wave. How Europe can gain from the rising tide of scientific data. Final Report of the High Level Expert Group on Scientific data. A submission to the European Commision 2010 Italy

Robinson, P. “Electronic Editions for Everyone, ” Text and Genre in Reconstruction. Effects of Digitalization on Ideas, Behaviours, Products and Institutions, Willard McCarty 2010 Cambridge, UK OpenBook Publishers

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None