The CWRC-Writer Bridge: from Coder to Writer, XML to RDF, DH to Mainstream

poster / demo / art installation
Authorship
  1. 1. Susan Brown

    English and Theatre Studies - University of Guelph

  2. 2. Michael Brundin

    Canadian Writing Research Collaboratory - University of Alberta

  3. 3. James Chartrand

    Open Sky Solutions

  4. 4. Ruth Knechtel

    Canadian Writing Research Collaboratory - University of Alberta

  5. 5. Andrew MacDonald

    Open Sky Solutions

  6. 6. Geoffrey Rockwell

    Humanities Computing - University of Alberta

  7. 7. Megan Sellmer

    Humanities Computing - University of Alberta

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Matt Kirschenbaum argues that “the story of writing in the digital age is every bit as messy as the ink-stained rags that would have littered Gutenberg’s print shop or the hot molten lead of the Linotype machine” (Schuessler) but the digital humanities community has paid surprisingly little attention to the interfaces that have impacted digital writing, particularly with respect to the tools that enable text encoding or markup. The field has been founded on an understanding of the value of text markup for encoding, preservation, and interoperability. DH projects employ markup consistently in major text-oriented DH applications ranging from linguistics through the production of critical editions to the production of born-digital scholarship and scholarly journals. Yet DH scholars tend to use alternative tools such as mainstream word processors for extensive scholarly writing. Even for submissions to our annual conference, we do not require our community to submit in the form of TEI-encoded texts, even though months are invested annually preparing the book of abstracts for submission. By contrast, submissions to most conferences in the scientific community are routinely produced by the submitting scholars in LaTeX, minimizing the production costs of publishing the submissions (Gauduel 2006). Furthermore, there has been little extension of awareness into the broader scholarly community of the value of XML in general and the TEI in particular, and consequently little uptake beyond the DH community itself.

This is not because humanities scholars cannot use markup. The pervasive use of wikis, blogging software, and social networking sites that allow limited markup indicate that the principles of markup, at least as they relate to on-screen rendering in HTML, are easily acquired. However, such contexts are conducive neither to an understanding of the nuances and procedural affordances of complex markup nor of how it can structure an entire document; the markup is simply an isolated means to a presentational end. So there is still a considerable gulf between the knowledge base of the mainstream scholarly community and that of the DH community. This situation is exacerbated by the fact that use of XML still demands that users acquire and install editing software that is divorced from the usual contexts of scholarly production and because such packages require a degree of setup that is daunting. Furthermore, the interfaces of XML editing programs tend to be far removed from those with which mainstream scholars are familiar. That of the oXygen XML editing package, for instance, is closer to the look and feel of a Java editor, a programming environment, than it is to a WYSIWYG writing environment. Such applications are not conducive to use by humanities scholars, whose main activities are writing and editing. Members of the DH community have vocally opposed the concept of WYSIWYG XML editors on the grounds that they would undermine a writer’s understanding of the function of markup, although this is not a technical consideration: oXygen’s libraries have been designed to support the production of interfaces that can allow “non technical users to encode information in XML without actually knowing anything about the underlying XML format” (Bina 2013). There thus exists a tension between the requirements of technically adept super-users, who want power from their tools, and the basic usability of editing interfaces with respect to writing, one that has been there from the early days of markup editors (Karney 1995).

The CWRC-Writer is the centerpiece of an online research environment, the Canadian Writing Research Collaboratory (CWRC), designed to support research on the study of writing in and about Canada. It aims at a user base of mainstream literary scholars, and recognizes the gap in knowledge between that community and digital humanists with respect to best practices for digital resource production. The CWRC-Writer is meant to work with the Collaboratory’s other online tools to help bridge that gap by facilitating the production of born-digital scholarship and primary text editions in XML. CWRC will provide an online repository to house digital objects for members of its research community. These will range from bibliographical records, granular chronology entries, profiles of authors and other historical persons, prosopographic data and other entity records, and transcriptions of primary texts, as well as other document types, images, audio and video, although the emphasis remains on writing. CWRC encourages collaboration, whether or not scholars work together formally on a specific project. The CWRC infrastructure supports sharing, reuse, and continual enhancement of scholarly materials, as well as open-access dissemination. Thus one of the most common use cases we anticipate is that someone who is using CWRC for their research sees an error or an opportunity for enhancement within an existing object. We want to make it as easy as possible for that scholar to correct the OCR error, add missing bibliographical information, clarify an ambiguity, or smooth out some infelicitous language. The scholar should not need to download specialized software, but rather move easily from a reading interface to a production interface within the same browsing environment, adding further value to the scholarly resource quickly and easily, in keeping with the realization that we need to overcome siloage between applications (Bradley and Hill 2011).

This is our main use case. Nevertheless, we do have amongst our user community serious textual editors who plan to create digital editions. Our assumption has always been that for heavy-duty markup or transformations one would need to go outside the CWRC-Writer environment to a full-featured XML editor. These advanced users are understandably testing the CWRC-Writer interface from the perspective of their full range of needs. Moreover, since we began publicizing development of the CWRC-Writer, we have received expressions of interest from members of the DH community who want to consider it for use in TEI editing projects, as components of library-based DH tool suites, or for teaching XML. The possible use cases for CWRC-Writer are thus situated along a spectrum ranging from the production of born-digital content or primary text editions from scratch, in which case much of the technical demands of markup application need to be performed within CWRC-Writer itself, through to the quick fixing of errors or editorial revision of existing documents. This spectrum can be elaborated, so that the poles of these two use cases are aligned with the degree of complexity of the interface and indeed of functionality of the editor itself, and the level of expertise expected of the user.

Interface Complexity Expert users Production from scratch

<-------------------> Interface Simplicity Novice users Quick edits/fixes
Simplicity is also relative and contextual. While the CWRC-Writer does not support advanced XML features, its interface has considerable complexity as a result of its integration with other aspects of the Collaboratory. It is more complicated, for instance, than the newly launched DHwriter designed to facilitate the production of abstracts for the DH conference. Because CWRC aims to support interoperability and discoverability by using Linked Open Data entities for authority control, the interface is complicated by the fact that the editor combines the application of XML markup with the annotation of documents with RDF entities. To make this blended approach as seamless as possible, we have mapped our RDF specifications for named entities onto the equivalent tags within supported XML schemas. Thus users identifying a person’s name within a text are simultaneously applying a <persName> tag, if the document is using a TEI schema, as well as creating an RDF Open Annotation object. This increases the challenge of interface design in a number of ways. For instance, there are some basic conceptual confusions with respect to terminology, since there are two types of “tagging” available within the editor, sometimes operating in tandem and sometimes not. Tagging means different things within technical vocabularies and mainstream folksonomic contexts, so it is a balancing act to bridge from popular literacies to the CWRC-Writer environment while also retaining sufficient accuracy of terminology to help develop digital humanities literacies. The resulting confusion has emerged as a theme in our user testing to date and brings home the extent to which the CWRC-Writer emerges from an expanded understanding of annotations and their potential to support new paradigms of interoperative and interactive digital scholarly environments (Bradley 2012; Grassi 2013).

We last reported on the CWRC-Writer editor in a pre-alpha state, since which it has undergone numerous development iterations and substantial user testing. By DH2014 we will have conducted our most extensive user testing on the beta version, with users ranging from novices to DH experts. Our aim is not only to improve the functionality of the system as an editor, but to try to understand how the interface works as a writing and encoding environment for various types of users, including asking respondents to compare the interface experience to other writing environments, both Web- and PC-based.

The poster will thus do the following:

Provide two computers on which people can test the CWRC-Writer;
Summarize the major affordances of the editor and the concepts behind it;
Summarize the results of the user testing.
The testing results will allow us to situate the CWRC-Writer as a tool within the current editing landscape, along the spectrum outlined above, and to evaluate the use cases for which the CWRC-Writer is best suited. The poster will facilitate dialogue regarding the relationship amongst text encoding, Semantic Web technologies, and mainstream scholarly writing processes. The testing results will provide insights into the tensions in interface design between expert vocabularies and best practices, on the one hand, and mainstream vocabularies and scholarly pragmatics on the other. We hope to make a contribution to “ways of seeing” (Kirschenbaum 2004) markup environments and relation to digital scholarly production on the Web.

References
Bina, George (2013). “Customizing a General Purpose XML Editor: oXygen's Authoring Environment.” Proceedings of the International Symposium on Native XML User Interfaces, 2013. http://www.balisage.net/Proceedings/vol11/html/Bina01/BalisageVol11-Bina01.html

Bradley, John (2012). “Towards a Richer Sense of Digital Annotation: Moving Beyond a ‘Media’ Orientation of the Annotation of Digital Objects.” DHQ 6.2 (2012).

Bradley, John and Timothy Hill (2011). “When WordHoard Met Pliny: Breaking Down of Interaction Silos Between Applications.” Digital Humanities 2011, Stanford University, June 19– 22, 2011. http://pliny.cch.kcl.ac.uk/docs/Stanford-Poster.pdf

Brown, Susan (n.d.). “Scaling Up Collaboration Online: Towards a Collaboratory for Research on Canadian Writing.” International Journal of Canadian Studies. Forthcoming.

Canadian Writing Research Collaboratory. www.cwrc.ca CWRC-Writer. https://github.com/cwrc/CWRC-Writer DHwriter. dhwriter.org

Gaudeul, Alexia (2006). “Do Open Source Developers Respond to Competition?: The (La)TeX Case Study.” (March 27, 2006). Available at SSRN: http://ssrn.com/abstract=908946 or http://dx.doi.org/10.2139/ssrn.908946

Grassi, Marco, Simone Fonda, and Franceso Piazza (2013). “Pundit: augmenting web contents with semantics.” Literary and Linguistic Computing 28.4 (2013).

Karney, James (1995). “Author/Editor.” PC Magazine 7 February 1995. 153ff.

Kirschenbaum, Matthew G. (2004). “‘So the Colors Cover the Wires’: Interface, Aesthetics, and Usability.” A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell. http://www.digitalhumanities.org/companion/

Rockwell, Geoffrey, Susan Brown, James Chartrand, and Susan Hesemeier. (2012). “CWRC- Writer: An In-Browser XML Editor.” Digital Humanities 2012 Conference, Hamburg, July 16–22. Digital Humanities 2012 Conference Abstracts. Hamburg University.

Schussler, Jennifer (2011). “The Muses of Insert, Delete and Execute.” New York Times, 25 December 2011.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO