LEXUS 3 — a collaborative environment for multimedia lexica

paper, specified "short paper"
Authorship
  1. 1. Shakila Shayan

    Max Planck Institute for Psycholinguistics - University of Nijmegen

  2. 2. André Moreira

    Max Planck Institute for Psycholinguistics - University of Nijmegen

  3. 3. Menzo Windhouwer

    Max Planck Institute for Psycholinguistics - University of Nijmegen

  4. 4. Alexander König

    Max Planck Institute for Psycholinguistics - University of Nijmegen

  5. 5. Sebastian Drude

    Max Planck Institute for Psycholinguistics - University of Nijmegen

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

LEXUS (The Language Archive 2013; Ringersma, et al. 2007) is a flexible web-based lexicon tool that was initially (between 2006 and 2010) developed at the Max Planck Institute for Psycholinguistics in Nijmegen within the program “documentation of endangered languages” (DOBES), funded by the Volkswagen Foundation (cf. Volkswagen Foundation). It is a tool specifically tailored for linguists whose research involves collecting and documenting a broad range of spoken data; data that is mainly in the form of audio and video recordings and often depicts a language that has never been documented before and is in danger of becoming extinct. Projects that are supported within the DOBES domain have specific requirements to be met by a lexicon tool. In particular the lexicon tool has to provide a high degree of flexibility with respect to lexicon structure to allow all possible structures, even those that might not have been seen in the description of other languages. Also, given the audio-visual nature of the data, there needs to be proper facilities for presenting multimedia within the lexicon. In what follows we will give a brief overview of how LEXUS is fine-tuned to meet most of the demands of this domain and how it is a useful tool for documenting endangered languages while at the same time suitable as a lexicographic tool for any other language.

LEXUS is not the only lexicographic tool with endangered languages as its key area of application. The KirrKirr software (Manning et al, 2001), the IDD (Indiana Dictionary Database, cf. De Korne, et al. 2009), the Toolbox, Lexique Pro, WeSay and FLEx (SIL 2013) programs are other examples, among which the SIL tools are the most comparable to Lexus. Toolbox is a tool for data management and analysis which includes lexical data and a parsing and glossing engine. Still, the outdated data model of toolbox seriously impedes its sustainability and interoperability with other tools. Lexique-pro is suitable for visualizing Toolbox/Shoebox data. FLEx on the other hand is a powerful lexicon tool with advanced parsing and analytical functionalities. Using other SIL tools one has sophisticated options to present and publish FLEx data. In terms of features and functionality for lexical databases by themselves, LEXUS stands somewhere in the middle with its set of features that aim at user-friendliness. Crucially, it offers online and shared access to the lexicon. It also allows for interoperability and customizable visualization for styled HTML views of data without requiring any knowledge of markup languages.

LEXUS makes use of schema structure trees for representing lexical entry elements. However, it does not enforce any pre-described schema structure for the lexicon of a given language. Instead, upon creating a new lexicon, LEXUS provides the user with a small collection of schema templates to choose from. These templates can be further developed into more complex schemas to build a particular lexicon. Some of these proposed templates have been created based on more standard frameworks such as ISO LMF (ISO 14613, 2008) and promote the usage of concept names and conventions that are proposed by the ISO data categories in ISOcat (Kemps-Snijders et al, 2009). There are even some templates that have been suggested and fine-tuned based on the experience of field linguists who have been involved in documenting endangered languages.

Having a flexible structure as a template is expected to be a helpful starting point for the researchers who are in the initial stages of the documentation process; especially when it comes to a language that has never been studied before. LEXUS’ attempt to more fully comply with ISO LMF structure demands a structure that offers circularities of a graph type structure in addition to a simple linear tree structure. LEXUS 3 now makes it possible to create cross-references between corresponding elements of two different lexical entries of a lexicon. The template collection together with cross-reference linking makes LEXUS quite flexible in creating a wide range of structures. This combination makes LEXUS a suitable lexicon tool for not only under-described languages but also for the most studied and well-documented languages such as English and German.

Previous versions of LEXUS have laid the groundwork to enable projects within the DOBES program to create and view multimedia lexica with a team. LEXUS 3 stabilizes these core features and offers new functions which broaden the scope to the wider linguistic community. One such example is making use of the templates to form the basis for flexible import and export functionality as fleshed out in the RELISH project (Aristar-Dry et al, 2012). This in turn provides better support for standardized lexicon formats and makes it, for example, possible to export a LEXUS 3 lexicon to the LEGO repository (LinguistList 2013) and to import a LEGO lexicon. This supporting feature is based on the new RELISH-LMF serialization (Windhouwer, et al. 2013; RELISH 2013), which is extensible using RELAX NG (ISO 19757-2, 2008) and ISO/TEI feature structures (ISO 24610-1 2008). LEXUS 3 is also being integrated into linguistic infrastructures like CLARIN (CLARIN 2013), which opens it up for an expanding user base.

Digital lexicography is most helpful when there is proper visualization of the content, and it gets even more worthwhile when it is integrated with multimedia. One of LEXUS’ distinct features are the possibilities it offers for visualization of the lexicon, which is often enriched with multimedia. The audio-video recordings and images are critical aspects of the semantic knowledge. Data of such nature is often the only available resource to study the kind of languages that LEXUS is designed for. The written form, together with grammatical, morphological and phonological descriptions of words, completes the semantic knowledge. A given lexicon tool should facilitate an unified way of presenting the text and multimedia together to be able to put the form and meaning next to each other in one picture. With version 3, LEXUS introduces a single unified environment, where users can describe how their lexica would look like, for e.g. in an HTML view (Moreira, et al. 2013). In doing so, users are not required to have any HTML knowledge. Instead, LEXUS offers a graphical tree tool, which mimics the hierarchical structure that is behind the markup-based technologies such as HTML, and which specifies the layout and style of the lexical views. The same tool can be used to create a formatted PDF view for the full lexicon, which is extractable and available for export and print. Figure 1 shows an example of a lexical entry, with an image and different styles for various elements of the entry. The style and the content of list of entries shown on the left side are customized with the same tool as those of the selected lexical entry shown on the right side.

Figure 1:
an example of customized list view and lexical entry view using the same styling tool.

Finally and possibly most importantly, LEXUS allows shared access to a given lexicon. Owners can easily share their lexica by dragging and dropping other users from a list of all registered LEXUS users to the list of readers/writers of an individual lexicon. When a lexicon is shared, it becomes available in the target user’s workspace. This feature paves the way for better collaboration among researchers and facilitates simultaneous work on a given language even from different places in the world. Being an online resource, any archived lexicon will in future be available as an addressable resource in the CLARIN infrastructure. Having an online basis LEXUS also allows for annotated multimedia sessions from the relevant language archive (e.g. sessions depicting particular cultural uses or social practices) to be linked to any part of the entry. The web-based accessibility of LEXUS was initially designed so as to allow the speech community members to get involved in the lexical documentation process. Such collaboration would allow for the continuous and faster growth of linguistic information. However, this appealing potential turned out to be more challenging in practice, mainly due to unforeseen obstacles within the social dynamics in the community (Cablitz 2011, 240).

In the future, special focus will be on an expansion of LEXUS’ functionality to match the requirements of a lexicon tool for sign language documentation; a domain for which there doesn’t exist a suitable tool or a unified schema of lexicography.

With its online accessibility, together with its archive-linking capacity, its multimedia visualization features and its interoperability capacities, we offer LEXUS as an advanced resource and research tool for the scientific community.

References
Aristar-Dry, H., S. Drude, J. Gippert, I. Nevskaya, and M. Windhouwer (2012). Rendering Endangered Lexicons Interoperable through Standards Harmonization: the RELISH project. In European Language Resources Association (ed), Proceedings of the Eight International Conference on Language Resources and Evaluation, held 23-25 May 2012 in Istanbul, Turkey.
Cablitz, G. (2011). The Making of a multimedia encyclopedic lexicon for and in endangered speech communities. In Haig, G. L. J., N. Nau, S. Schnell, and C. Wegener (eds.), Documenting endangered languages: Achievements and perspectives. Berlin: De Gruyter, 223-261.
CLARIN (2013). Common Language Resource and Technology Infrastructure. http://www.clarin.eu/ (accessed 14 March 2013).
De Korne, H., and the Burt Lake Band of Ottawa and Chippewa Indians (2009). The Pedagogical Potential of Multimedia Dictionaries — Lessons from a Community Dictionary Project. In Reyhner, J. and Lockard, L.(eds), Indigenous Language Revitalization: Encouragement, Guidance and Lessons Learned. Flaggstaff, AZ: Northern Arizona University, 141-153.
DOBES (2013). Documentation of Endangered Languages. http://www.mpi.nl/dobes/ (accessed 14 March 2013).
ISO 14613. (2008). Language resource management — Lexical markup framework (LMF). International Organization for Standardization. http://www.lexicalmarkupframework.org/ (accessed 14 March 2013).
ISO 19757-2. (2008). Information technology — Document Schema Definition Language (DSDL) — Part 2: Regular-grammar-based validation — RELAX NG, International Organization for Standardization.
ISO 24610-1.(2008). Language resource management — Feature structures — Part 1: Feature structure representation, International Organization for Standardization.
Kemps-Snijders, M., M. A. Windhouwer, P. Wittenburg, and S. E. Wright. (2009). ISOcat: Remodeling Metadata for Language Resources. In Open Forum on Metadata Registries of the International Journal of Metadata, Semantics and Ontologies (IJMSO), 4:4: 261-276.
LinguistList. (2013). Lexicon Enhancement via the GOLD Ontology. http://lego.linguistlist.org/ (accessed 14 March 2013).
Manning, C. D., K. Jansz, and N. Indurkhya (2001). Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary. Literary and Linguistic Computing, 16: 123–139.
Moreira, A., M. Windhouwer, A. König, and S. Shayan (2013). LEXUS 3: Uniform Presentation Methodology for Lexica, International Conference on Language Documentation & Conservation' (ICLDC), held 28 February-3 March 2013 in Hawaii.
RELISH.(2013). RELISH-LMF. http://tla.mpi.nl/relish/lmf/ (accessed 14 March 2013).
Ringersma, J., and M. Kemps-Snijders (2007). Creating multimedia dictionaries of endangered languages using LEXUS. In Van Hamme, H. and van Son, R. (eds), Proceedings of Interspeech 2007. Baixas, France: ISCA-Int. Speech Communication Assoc, 65-68.
SIL.(2013). http://www.sil.org/resources/software (accessed 14 March 2013).
TLA.(2013). LEXUS — A web based lexicon tool. http://tla.mpi.nl/tools/tla-tools/lexus/ (accessed 14 March 2013).
Volkswagen Foundation. (2013). http://www.volkswagenstiftung.de/ (accessed 14 March 2013).
Windhouwer, M., J.Petro, I. Nevskaya, S. Drude, H. Aristar-Dry, and J. Gippert (2013). Creating a serialization of LMF: the experience of the RELISH project. In G. Francopoulo (ed.), LMF: Lexical Markup Framework, theory and practice. iSTE/Wiley.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.