Stanford University
This paper discusses the goals, architecture, and usability of Kirrkirr, a Java-based visualization tool for XML dictionaries, currently being used with a dictionary for Warlpiri, an Australian Aboriginal language.
While dictionaries on computers are now common, there has been surprisingly little work on innovative ways of utilising the capabilities of computers for visualization, hypertext linking and multimedia in order to provide a richer experience of dictionary content. Most electronic dictionaries present the search-dominated interface of classic information retrieval (IR) systems, which are only effective when the user has a clearly specified information need and a good understanding of the content being searched. The ability to browse often makes paper dictionaries easier and more pleasant to use than such electronic dictionaries. Search interfaces are ineffective for information needs such as exploring a concept. Some work in IR has emphasised the need for new methods of information access and visualization for browsing document collections (Pirolli et al. 1996), and we wish to extend such ideas into the domain of dictionaries, in part because indications are that current interfaces are unlikely to have much direct educational benefit for students (Kegl 1995).
Our goal has been to provide a fun dictionary tool that is effective for browsing and incidental language learning. In particular we attempt to address Sharpe's (1995) "distinction between information gained and knowledge sought". The speed of information retrieval that e-dictionaries deliver, and the focused decontextualized search results they provide, can frequently lead to loss of the memory retention benefits and chances for random learning that manually searching through paper dictionaries provides.
Within the Australian context, indigenous dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met. Another major goal has been to design an interface usable by, and interesting to, young users and other language learners. From this viewpoint, the low level of literacy in the region, and the inherently captivating nature of computers suggests that an e-dictionary is potentially more useful than a paper edition. Among other benefits, we can provide an interface less dependent on good knowledge of spelling and alphabetical order.
Our dictionary interface initially targeted Warlpiri, a language of Central Australia, for which there has been an extensive on-going project for the compilation of semantically-rich lexical materials (Laughren and Nash 1983, Hale and Laughren [to appear]). We converted this data from a non-standard format into a richly-structured XML version (XML 1999). The current version uses ad hoc indexing of this textual version for efficient access, but we expect to move to XQL, as this standard matures. Our system is written in Java, using the Swing API, and runs on all major platforms (Windows, Mac, Unix).
For dictionaries with plain textual content behind them, there is little that they can provide in the way of output but an on-line reflection of a printed page. In contrast, XML allows definition of the precise semantics of the dictionary content, while leaving unspecified its form of presentation to the user. We exploit this flexibility in our application, by having the program mediate between the lexical data and the user. The interface can select from and choose how to present information, in ways customised to a user's preferences and abilities.
One dimension is that as well as the definitions of words, users frequently want to know their relationships to other words, and the patterning in these relationships. Kirrkirr provides a color-coded network display of semantic links between words, which can be explored, manipulated and customised interactively by the user (Jansz et al. 1999) using the animated graph-drawing techniques of (Eades et al. 1998, Huang et al. 1998). In their spring algorithm, a network of words become nodes which are held apart by gravitational repulsion, but kept from becoming too far apart by springs which have a natural length. This graph algorithm differs from most others by providing iterative updating of the graph layout, which means that users can drag nodes across the screen, and the algorithm will cause other nodes to flee out of the way, while words related to another word are dragged along. The detailed semantic markup of the dictionary, with many kinds of semantic links (such as synonyms, antonyms, hyponyms, and other forms of relationships) allows us to provide a rich browsing experience. For example, the ability to display different link types graphically as different colors solves one of the recurring problems of the present web, with its one type of link: users have some idea of what type of relationship there is to another word before clicking. Thinking of the lexicon as a semantic network with various kinds of links was a leading idea of the WordNet project (Miller et al. 1993), but the simple text based computer interface they provide fails to do justice to the richness of the underlying data. Others have attempted to remedy this lack (e.g., Plumbdesign 1998), but we feel that our work is better aimed at providing the kind of simple network display suitable for our users.
To augment traditional semantic relations in the dictionary, we provide also linkages derived automatically from collocational analysis (of the limited amount of online Warlpiri text), and present an interface derived from semantic domains. These interfaces both address the notion of "terminology sets" - words that belong together, a notion which seems particularly salient for native speakers (Goddard and Thieberger 1997). We discuss the determination of collocational bonds, using the method of Dunning (1993), and the limitations of what we can do with the data available.
Formatted dictionary entries, displayed using HTML, are produced from the underlying XML by the use of XSL stylesheets (XSL 1999). These provide conventional hypertext for navigating between entries, in particular providing a color-coding of different kinds of semantic relationships between words which is consistent with that in the network display. A variety of XSL stylesheets are provided, which can give different formatting to the dictionary content appropriate to different users. For instance, items such as abbreviations for parts of speech, and other grammatical notes, and detailed decompositional definitions can be confusing for most Aboriginal users (Corris et al. 1999), and style sheets can provide just the desired information in large easy-to-read type.
In addition to the above, the dictionary incorporates multimedia - the user can hear words and see appropriate pictures - and a conventional search interface. The dictionary provides a user-friendly console where search results can be sorted and manipulated. As well as standard keyword search, which can optionally be restricted to appearance within a specified XML entity, the system provides two features targeted towards two principal groups of users. Linguists often want to search for particular sound patterns (such as certain types of consonant clusters), and so the system allows regular expression matching for such expert users. On the other hand, the limited literacy level of many potential users means that they will have particular problems looking up words. In part this is due to particular problems whereby the phonetic orthography of Warlpiri does not match very closely to the (rather arcane) spelling rules of English in which their literacy skills are usually based. To alleviate this problem, we have implemented a "fuzzy spelling" algorithm which attempts to find the intended word by using rules which capture common mistakes, sound confusions and alternative spellings.
We have performed some preliminary trialling of the dictionary through visits by Mim Corris to Yuendumu and Willowra, and Jane Simpson to Lajamanu. This has involved completing dictionary tasks, and observational use with primary and lower secondary students and trainee Warlpiri literacy workers, and comments from teachers and other adults. In general reactions have been quite enthusiastic, and the dictionary does appear to succeed in creating and maintaining interest. We have received suggestions on how to make it a better basis for classroom activities, which we hope to incorporate in future versions.
The diversity of areas researched in this work is rare relative to past work in electronic dictionaries, which often addresses the problems of storage, processing and visualisation/teaching as unrelated. Despite some significant research into the construction of lexical databases that go beyond the confined dimensions of their paper ancestors, there has been little attempt at seeing this work through to benefiting people such as language learners, who could truly gain from a better interface to dictionary information. Additionally, the range of potential users here is considerably more diverse than encountered in typical studies of dictionary usability (e.g., Atkins and Varantola 1997). For instance, issues such as low levels of literacy are rarely touched on. Our system has attempted to reduce the importance of knowing the written form of the word before the application can be used, while having ample opportunities to learn written forms. Features such as an animated, clearly laid out network of words and their relationships, multimedia and hypertext aim at making the system interesting and enjoyable to use. At the same time, features such as advanced search capabilities and note-taking make the system practical as a reference tool. Having designed the system to be highly customisable by the user, it is also highly extensible, allowing new modules to be incorporated with relative ease. We thus think that it is a good foundation for an electronic dictionary, and while the focus of this research has been on Warlpiri, this research (and the software constructed) can be easily applied to other languages.
References
Atkins, B.T.S., and Varantola, K. (1997). Monitoring dictionary use. International Journal of Lexicography 10(1):1-45.
Corris, M., Manning, C., Poetsch, S., and Simpson, J. (1999). Using dictionaries of Australian Aboriginal languages. Paper presented at the Applied Linguistics Association of Australia Annual Congress, Perth.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61-74.
Eades, P., Huang, M., and Wang, J. (1998). Online Animated Graph Drawing using a Modified Spring Algorithm. Proceedings of the 21st Australian Computer Science Conference, pp. 17-28.
Goddard, C. and Thieberger, N. (1997). Lexicographic Research on Australian Languages 1968 - 1993. In M. J. Walsh and D. Tyron (eds) Boundary Rider: Essays in Honour of Geoffrey O'Grady, pp. 175-208. Pacific Linguistics, Canberra.
Hale, K. L. and Laughren M. (to appear). The Warlpiri Dictionary.
Huang, M. L., Eades, P., and Cohen, R. F. (1998). WebOFDAV: Navigating and visualizing the Web on-line with animated context swapping. Proceedings of the 7th International World Wide Web Conference, pp. 638-642.
Jansz, K., Manning, C. D., and Indurkhya, N. (1999). Kirrkirr: Interactive Visualisation And Multimedia From A Structured Warlpiri Dictionary. Proceedings of AusWeb99, the Fifth Australian World Wide Web Conference, pp. 302-316.
Kegl, J. (1995). Machine-Readable Dictionaries and Education. In D. Walker, A. Zampolli and N. Calzolari (eds) Automating the Lexicon: Research and Practice in a Multilingual Environment. Oxford University Press, Clarendon.
Laughren, M. and Nash, D. G. (1983). Warlpiri Dictionary Project: Aims, method, organisation and problems of definition. Papers in Australian Linguistics No. 15: Australian Aboriginal Lexicography, pp. 109-133. Pacific Linguistics, Canberra.
Miller, G., Beckwith, R., Fellbaum, C., Gross, R. and Miller, K. (1993). Introduction to WordNet: An On-line Lexical Database. In C. Felbaum (ed) (1998). WordNet: An electronic lexical database. MIT Press.
Pirolli, P., Schank, P., Hearst, M. A., and Diehl, C. (1996). Scatter/Gather Browsing Communicates the Topic Structure of a Very large Text Collection. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '96).
Plumbdesign (1998). Visual Thesaurus Java applet <http://www.plumbdesign.com/thesaurus>
Sharpe, P. (1995). "Electronic Dictionaries with Particular Reference to an Electronic Bilingual Dictionary for English-speaking Learners of Japanese", International Journal of Lexicography, Vol. 8, No. 1, pp. 39-54.
XML (1998). Extensible Markup Language (XML) 1.0 W3C Recommendation 10-February-1998. In T. Bray, J. Paoli, and C. M. Sperberg-McQueen (eds) <http://www.w3.org/TR/1998/REC-xml-19980210>.
XSL (1998). Extensible Stylesheet Language (XSL) Version 1.0 Working Draft. In J. Clark and S. Deach (eds) <http://www.w3.org/TR/WD-xsl>.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at University of Glasgow
Glasgow, Scotland, United Kingdom
July 21, 2000 - July 25, 2000
104 works by 187 authors indexed
Affiliations need to be double-checked.
Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/