Word Formation in Pedagogical Lexicography: Linguistic and Technical Aspects

  1. 1. Andrea Abel

    European Academy Bozen-Bolzano (EURAC)

  2. 2. Judith Knapp

    European Academy Bozen-Bolzano (EURAC)

  3. 3. Pius ten Hacken

    University of Swansea

Word Formation and Second Language Acquisition

Word formation is one of the most important ways of extending the lexicon. As opposed to borrowing and semantic extension, the other two main sources of extensions, word formation is largely governed by rules. These rules are not only used for producing new forms, but also for structuring the lexicon. Therefore, there are two reasons why second language learners can benefit from knowing the mechanisms of word formation: for understanding new, unknown words and for perceiving the relationship between known words. As shown by ten Hacken (1998), these advantages do not automatically arise by using electronic dictionaries, but require an appropriate structure. In this paper we show how this problem was approached in the context of the ELDIT project.

Word Formation in ELDIT

At the European Academy of Bolzano/Bozen an interdisciplinary team consisting of linguists and computer scientists is developing an electronic learner's dictionary for the Italian and German languages called ELDIT. ELDIT is a basic dictionary which contains about 3.500 lemmas for each language and is freely accessible on the web (http://www.eurac.edu/eldit). It is addressed to a well defined target group, namely Italian-speaking students of German and German-speaking students of Italian from beginner to intermediate level (Abel & Weber 2000, Gamper & Knapp 2002b).

Each entry of the dictionary contains a wide range of information which is important for the learner. The information is stored in different modules: one or more definitions together with examples and translations, free combinations and collocations, idiomatic expressions, proverbs, and fixed phrases, word fields, word families, usage remarks, grammar and pronunciation. ELDIT is currently enlarged to a complete language learning system that should enhance bilingualism in multilingual regions. The system could be enhanced to other languages. We therefore propose it as a model for multilingual language learning environments.

In this paper we report about a very specific aspect of the dictionary, namely word formation: Starting from a single entry-word (lemma) the user can click on the tab "word family" and access in this way a list of compounds and derivatives with corresponding explanations.

Word formation is represented in ELDIT in the following way (see figure 1):

Frequently used derivatives and compounds are listed together. In future we will add head word which is a simplex word that cannot be decomposed further (Augst 1998).
The basic part of each element is emphasized. For this basic part we consider the stem of a derivative (e.g. "Haus" -> Behausung, Häuschen, häuslich ...) or this part of a compound word that appears as headword of the entry within which it is listed (e.g. "Haus" -> "Bauernhaus, Hausnummer, Haustür ...").
Hyperlinks lead the user to further information:
derivatives and compound words of the basic vocabulary contained in ELDIT are linked with the relevant word-entry;
prefixes and suffixes of the derivatives are linked with explanations of their meaning, use, particularities and with more examples;
also the so called "linking elements" (Fugenelemente) in German compounds are linked with special explanations.
Furthermore, the user can find translation(s) for each element.
If necessary, explanations about the word itself are given (for instance if a word is used only in a colloquial sense, etc.)
By clicking onto the triangle next to each word, lexicographic examples can be obtained out of the large database of examples in ELDIT (This feature is already working, but not yet enabled in the online version).

Figure 1: Derivatives and compound words in ELDIT

In the simplest case the module "word formation" is helpful for decoding and encoding purposes. Its main objective, however, is to convey the complex system of word formation in a quite simple and transparent way. The user should learn that also semantic similarities exist between formally related words. As a result the learner should be able to analyze the structure of words, to use the L2 in a more creative and productive way by themself (this is important in particular for extremely productive processes such as German compound words and Italian evaluative suffixation) and to draw conclusions on the meaning of unknown words in receptive linguistic situations.


Approaching the previously described ideas in the usual way would require the manual input of a huge amount of data and a very detailed encoding of these data. This extensive and detailed encoding was judged not feasible by the authors of the dictionary.

Hence, we supported the authors in the data encoding process by electronically rewriting a hand made, semi structured version of the data into the final extensive version needed by the system. Moreover computational linguistic resources were exploited and as much linguistic information as possible was added electronically. As a uniform data and knowledge representation formalism we use the XML language (for more details see Gamper & Knapp, 2002). The authoring process can be described as follows:

The first step is to manually elaborate the language learning content and to submit and save it using an editor in a possibly already semistructured way. Figure 2 shows a derivation in ELDIT developed with a very simple XML-editor in this stage.

Figure 2: Semistructured data encoded in XML

Then a transfer tool converts the manually elaborated data into the special format needed by the system. Figure 3 shows the same part of the ELDT data as figure 2, but in the fully encoded stage.

Figure 3: Fully structured data encoded in XML

A lot of information has been added electronically: every XML-element got a unique ID (see for instance id="de.n.haus.1.deriv2" in the element <derivation>). The words have been equipped with citation form and part-of-speech (see the attributes base="Behausung" and ctag="N" in the element <pattern> or the attributes base="dimora" and ctag="N" in the element <w> within <translation>). Wherever possible a reference to the corresponding dictionary entry is added (see for instance the attribute lexref="it.n.dimora.1.sense2" in the element <w> of <translation>). Also references to explaining sections are added (see the attribute explref="de.prae.h.be" which points to a section within which word derivation with the prefix "Be" are explained).

Thanks to a collaboration between the European Academy of Bolzano and the Scuola Universitaria Professionale della Svizzera Italiana (Funded by the European Union, Interreg IIIA, Italia-Svizzera) involving Word Manager (WM) and ELDIT we can add morphological information to each word. WM is a system for reusable morphological dictionaries. Ten Hacken & Domenig (1996) describe the general architecture and the rule types involved. Lexical tools derived from WM databases are included into ELDIT and allow transforming a word form to its citation form, delivering at the same time its category (Pedrazzini, 1999). Hence, we can equip each word with its citation form (attribute "base"), its word form (attribute "ctag") and a pointer to the full meaning description of the word in the dictionary (attribute "lexref").

WM is also able to suggest new derivations, providing information about their segmentation, derivational degree, frequency within ELDIT, category, etc. These possibilities are currently exploited in an authoring tool for efficiently extending the existing families with new elements and more information for the learner, thus providing rapid access to a rich source of high-quality information.


By exploiting the possibilities of an appropriate representation of word formation rules and their products, we managed to create an enhanced language learning environment. The potential of the WM system in this respect was noted by ten Hacken & Tschichold (2001). In the realization of this potential, the way different properties of ELDIT and WM collaborate and reinforce each other demonstrates the flexibility of the former as well as the reusability in practice of the latter.


