Department of Classics - University of Toronto
This paper presents some Java-based solutions to the problems of manipulating ancient Greek text. Ancient Greek employs a system of vowel accentuation which includes three tone accents, two breathing markers, and an 'iota subscript', all of which can appear in combination. In present use there are several mutually incompatible computational encoding schemes for ancient Greek. Some mark accents by composition; others use 'precomposed' characters, with different character codes for each combination of accent(s) and vowel. The most thorough scheme is the BetaCode notation, a compositional approach which uses only ASCII characters and therefore can be used on any platform. BetaCode is, however, difficult to read. More legible schemes, like SuperGreek, WinGreek, SMK GreekKeys, or even the MacOS Greek system, are in use, but these are not as complete as BetaCode, and most of them are restricted to a single operating system.
The Unicode Standard 2.0 promises to provide a complete and system-independent encoding for the storage and transmission of ancient Greek. However its representation of Greek is more complex than its predecessors. This is because it includes both a compositional notation (based on ISO 8859-7 and using combining diacritical marks) and a precomposed notation. An application will need to deal sensibly with the 'canonical equivalents' that arise from these two different code spaces.
The Java computing language offers the opportunity to build a cross-platform approach to ancient Greek encoded in Unicode and to translate Greek texts from other encoding schemes into Unicode. In response, I produced a simple object-oriented representation of a Greek character which includes such fields as character, accent, breathing and others related to the reading of the character. Methods were added to set values to these fields, allow error-checking, and to build these characters into texts. The result is a representation of ancient Greek that is far more like the way a scholar thinks about the language. The programmer can, for instance, 'ask' each vowel in a word to give its length. To translate standard encodings into and out of the abstract representation and amongst each other, four translators have been built, one for plain-text output and one each for the BetaCode, GreekKeys, and Unicode notations.
This package of classes has proved a flexible and convenient application programming interface (API) for two unique applications. The first of these is JAGSort, a CGI application which sorts lines of text according to the first Greek word
<http://smaug.java.utoronto.ca/~brucerob/JAGSort/>.
The object-oriented approach of the package made it possible to write simple and fast comparison methods. Ideal for indexes or for providing databases with an 'alphabetical order' field, JAGSort is, to my knowledge, the first publicly available application for the sorting of ancient Greek text. Furthermore, because it operates on the internal representation, JAGSort can sort texts in any notation for which a translator has been written. Second, the Java and Ancient Greek package forms a good basis for computational pedagogy. As an example of its potential, I developed the accentquiz applet
<http://smaug.java.utoronto.ca/~brucerob/Greek/accentquiz.html>.
The quiz provides a series of unaccented words for the student properly to accentuate. Incorrect characters are bordered with a red rectangle, correct characters with a green one. Observing that new students of Greek find keyboard encodings of accentuation most difficult, I used a drag-and-drop metaphor for adding accents to Greek words.
There are many potential further applications of the API. On the one hand, the existing applications could be more tightly integrated into existing computational tools. For instance, using JavaBean component technology, the JAGSort methods could be provided as a plug-in module for off-the-shelf databases or word-processors. Using the translation methods, editors could automatically standardize submissions to their encoding scheme of choice and, more importantly, all Hellenists could migrate more smoothly to the Unicode standard. My final goal, though, is an algorithm for producing grammatically correct forms of Greek words. This is the crucial first step in producing a responsive and flexible system for Greek pedagogy.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)
Debrecen, Hungary
July 5, 1998 - July 10, 1998
109 works by 129 authors indexed
Conference website: https://web.archive.org/web/19991022041140/http://lingua.arts.klte.hu/allcach98/
References: http://web.archive.org/web/19990225164509/http://lingua.arts.klte.hu/allcach98/abst/jegyzek.htm
Attendance: ~60 (https://web.archive.org/web/19990128030244/http://lingua.arts.klte.hu/allcach98/listpar3.htm)