GRADE: a GRAmmar Development Engine

Authorship
  1. 1. Harry Schmidt

    Princeton University

  2. 2. Helma Dik

    University of Chicago

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Grade project allows authors of complex reference works
to encode documents using familiar features from print
reference works, but, crucially, adds the possibility for the
author to mark where For_‘Dummies’-level information stops
and For_The_Dozen_Living_Experts-level information starts,
plus the relevant stages in between. In turn, readers of Grade
documents are able to switch from ‘Dummies’ mode to Expert
mode at the click of a button. The back-end to all of this is
provided by an XML database. Open-source release of Grade
1.0 is planned to coincide with Digital Humanities 2007.
Rationale
Existing grammars of any language are not easy to use and
typically envisage only one kind of user at a time. But a
user may vary in her preferences from moment to moment.
Does she want 'just the rule' right now? Does she want to see
examples, cited in the original language and in translation, and
perhaps print them out? But tomorrow, perhaps, she also wants
the evidence for that rule, or discussion of its problematic status
as a rule?
Any printed reference grammar, even one for a homogeneous
audience, is inevitably a product of compromises. Space devoted
to discussion of different schools of thought detracts from
clarity; space devoted to exceptions detracts from main rules;
space devoted to examples and translations clutters the page,
etcetera. Paradoxically, then, every good grammar is a bad
grammar precisely for the reasons that it is good. A solid
number of sensibly discussed examples coupled with a fair
representation of scholarly opinion on an issue will never fail
to obscure what should be the statement of a simple rule; a
simple statement of a rule, however, will never do justice to
the complexity of the language and needs, at a minimum,
illustrations from actual texts.
An electronic reference grammar, on the other hand, does not
have to make any of these painful choices. What it can offer
instead is the capability for any end user to select, with the click of a button, for any part of the syntax, the level of detail he or
she wishes to see at that particular moment, and to switch at
any time to further detail, extra examples, or, conversely, back
to only the main rule, with no exceptions, no examples, and no
translations of examples. If desired, users can choose for
themselves what section to print, and in what format (through
application of XSLT).
Current collections of thematically unified materials, such as
The Perseus project (perseus.tufts.edu) offer user configuration
in the form of cookies, which set a user's default language
(original texts or translations), default font, and the like.
However, Perseus and collections like it derive most of their
power not primarily from this user configurability but from the
intelligent linking of resources that were previously separate:
Words in the texts can be analysed by the morphological parser,
which in turn provides links to the lexicon, which in turn allows
for further word searches, etcetera.
The Perseus texts, then, can be said to 'talk to each other' and
enhance the usability of resources compared to their print
versions: The system is aware, for instance, that a user is reading
Thucydides when she clicks on a word to get a parse or a
meaning; accordingly, the system will highlight Thucydides
citations in any dictionary entry called up. GRADE takes this
configurability one step further in allowing users to retrieve
information customized not just for the text that they are reading
but for the level of complexity that they select.
While traditional rich encoding following, e.g., TEI guidelines
has typically aimed for faithful transcription of an original print
publication, or generally, has aimed to serve the intents of an
author/publisher, the current project uses its XML schema to
allow the end user this flexibility at all times, so that this one
grammar can become, in effect, all grammars to all people. At
the same time, GRADE empowers the author in all the ways
that electronic publishing has started to empower authors, and
then some: a GUI for authoring, the possibility for piecemeal
and immediate publication, as found in blogs and wikis, but
paired with a capacity for richly structured data and full editorial
oversight.
GRADE finds its origin in the second author's need for a
grammar authoring tool, but we submit that the system can be
re-used for many projects that share the characteristics of
complexity and an intended audience with widely varying goals
and levels of expertise. Accordingly, in this demonstration of
the beta version, a second humanities application is offered in
the form of a Latin learners’ vocabulary which is organized in
sets of basic vocabulary and author-specific vocabulary, and
in which database translations from Latin into several languages
can be stored. Users can opt for display of the vocabulary in
Latin and any of these languages, can select particular authors
or genres, can opt for display in flashcard format, etc.
Design Features
GRADE is written in Perl, chosen for its robustness and
because it is a well-supported Web-centric application
programming framework. GRADE is written in such a way
that even a novice computer programmer can write extensions
that increase its functionality. Two templates (grammars and
vocabularies) will be available in June 2007; tutorials on
extensions are foreseen for Summer 2007.
GUI for XML database
The project has a graphical user interface for its XML
database, which allows for entry, update and management
of the objects (grammar sections) and their hierarchical
organization (grammar chapters and smaller sections) in the
database and for their encoding as specific types of information:
text can be marked as examples, translations, notes, levels of
complexity, tables, and lists. Authors can enter text in a standard
browser (Firefox) without having to add in XML tags by hand
(figure 1:editor screen). This makes for convenience for the
authors but also avoids validation problems. The native XML
database backend is based on Sleepycat Software's open-source
Berkeley DB XML 2, provides full support for Unicode UTF-8
and allows queries via XPath 2.0 and XQuery 1.0. Standard
open-source tools are used throughout, including Apache and
Perl.
Rendering Engine
The rendering engine gets objects from the database
(sections of the document) and renders them with XSLT.
Users (that is, readers of the grammar) can browse through
different levels of complexity, choosing to display more or less
information, and what types of information to display, with a
click on 'collapse boxes' or by adjusting their preferences (figure
2: Rendering options).
GRADE supports common Web formats and protocols for
information exchange (XML, XMLRPC, and JSON), so that
another project could potentially even integrate a real-time data
feed (that is to say, up-to-date data rather than a particular
revision which may become obsolete). Alternately, "calling"
programs can also request a specific revision of a document to
ensure its stability.
User registration and privileges
Users on the authoring side need administrative privileges
to edit the database (see further below). For non-author users, cookies will retain preferences (for display level,
translation display, etc.) during one session. Users may choose
to register in order for preference to be retained beyond sessions.
The user database recognizes different levels of privileges for
users. It permits those with administrative privileges to make
changes to the database at will, subject to hierarchical controls
that prevent changes to sections controlled by higher-order
administrators. A versioning system, at the bottom of the editing
screen, guarantees the integrity of all documents in the system
and permits users to compare prior edits of a given section. In
its final state, GRADE will provide the kind of precise revision
tracking necessary for a rigorous academic project, including
one feature that all existing Wiki software (including
MediaWiki, the software used by Wikipedia) lacks -- the ability
to see, from within a document, who wrote what portions of it
and when.
What GRADE is not
GRADE is emphatically not a pedagogy application or
learning management system, a la SCORM from ADL.
It empowers both authors, by allowing them to present richly
coded complex content, and end users, by giving them a flexible
document to use and explore, not by feeding them content
piecemeal or by tracking them. An internet-savvy audience, we
submit, now long-accustomed to browsing hypertexts, should
not be forced into SCORM’s ‘sequencing,’ which aims to have
users experience content in previously specified order.
Conclusion
GRADE enhances electronic publishing formats in
important ways. It offers an important toolkit for authors
in the Humanities and beyond that will allow them to empower
the readers of their texts and thereby reach wider audiences.
GRADE allows for co-authoring without jettisoning editorial
oversight or losing track of who wrote what; it provides a
powerful tool both for authors in fields in which a rich tradition
of grammatical description already exists, such as many
European languages, as well as for those in need of a tool for
writing the first reference grammar of a language ever to be
published.
Figure 1: Editor screen. Note ‘depth level’ 5 next to ‘paragraph’.
Figure 2: Rendering options

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2007

Hosted at University of Illinois, Urbana-Champaign

Urbana-Champaign, Illinois, United States

June 2, 2007 - June 8, 2007

106 works by 213 authors indexed

Series: ADHO (2)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None