ELECTRONIC DICTIONARIES AND METALEXICOGRAPHY: THE DIGITAL VERSION OF THE DEUTSCHE WÖRTERBUCH BY JACOB AND WILHELM GRIMM AS A BASIS FOR METALEXICOGRAPHICAL RESEARCH.

paper
Authorship
  1. 1. Thomas Schares

    Universität Hamburg (University of Hamburg), Universität Trier

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The electronic version of the Deutsche Wörterbuch (=DWB) by Jacob and Wilhelm Grimm (33 vols., 1854-1971), the largest dictionary of the German language, has been on the Internet for over a year; an offline version will also be available soon. In recent years, electronic dictionaries have become a familiar tool for scholarly work. Access to dictionaries is much easier and very comfortable once the huge quantities of lexical information which was hitherto stored in weighty multi-volume works has been transformed into immaterial data streams within a computer. So far, the electronic version has proved to be ideal for users of such reference works. But the electronic form can do much more: it allows us to pursue forms of metalexicographical research into the structure and the history of the DWB which has not been possible before. The DWB, like the OED, is based on scholarly principles and reflects the history of philology and linguistics over a long period. Thus it forms an ideal object for the study of the changing methods and predilections in the field of lexicography and lexicographical research. Not very long ago, the scholar often had to apply dubious methods when studying the way a dictionary is compiled. For example, to inquire into the frequency of quotations of a given author, only small “representative” portions of the dictionary were usually searched. The results were then used to calculate the total of occurrences (e. g. SCHLAEFER 1999 and SCHULZ 1999 use both one entry as a starting point). Factors like the inconsistency or the historicity of the dictionary were often left aside. Similarly, before the electronic version of the DWB was available, the exact number of entries was unknown. In a book published in 2001 the author estimates the number at about 400,000 to 500,000 entries (HAß-ZUMKEHR 2001). Now that we have this dictionary in electronic form, we know that it comprises exactly 249.545 main entries, adding the subentries we have a total of 320.505, this number showing how far previous estimations have been off the mark. In this paper, I would like to show how we can explore more reliable ways of metalexicographical research on the basis of the electronic DWB and its TEI compliant structural markup. It is already a commonplace that it takes no more than a few seconds to find out how often Shakespeare is quoted in the OED, or how often Goethe appears in the DWB. But electronic dictionaries do not only lend themselves to such comparably simple full-text searches. The underlying text-databases contain the dictionary's content enriched with markup that represents the structure of the dictionary. The SGML/XML-encoded structure permits complex and sophisticated searches as are necessary for studies in metalexicography. The structural base unit of a dictionary is the entry: one could call a dictionary a (usually) alphabetical collection of entries or articles. All information to a certain headword is collected in the related entry, and organized and presented in a certain way. Metalexicography is interested in finding out how this information is organized and presented, that is how it is structured. In order to gather information about the macro- and micro-structure of the DWB, I chose a new approach, sorting all entries according to their length. Each entry was given a fixed value consisting of its number of lines. These values were retrieved automatically from the SGML-tagged data of the DWB by using TUSTEP scripts. Table
115
The table gives an exact account on entry length. One can see that an average of 63,4 per cent of the entries of the dictionary are shorter than six lines. One can also perceive that the percentages for some letters deviate considerably from this average. Older parts of the dictionary have higher percentages than younger parts. This suggests that the average entry is longer in younger parts of the DWB, which supports the assumption that the reinforced collection of slips after the turn of the century affected the structure of entries: more slips led to longer entries. Entry length can also form the basis for a structural analysis of dictionary entries. Dictionaries like the DWB are famous for their very large entries like GEIST, GEWALT etc., and entry analysis in metalexicography usually focuses on larger entries with complex structures and rich semantics (e. g. SCHMIDT 1986). However, given the fact that almost two thirds of the entries of the DWB are shorter than six lines, I consider it essential to take a closer look at these small entries. By doing this, general statements about entry structures can be made, but at the same time peculiarities in the work of individual lexicographers become evident. In the parts prepared by Jacob Grimm, for instance, one specific type of entry appears which consists of a lemma, a grammatical designation, a definition and a Dutch equivalent:
BLONDHEIT, f. color flavus, nnl. blondheit. (DWB 2, 143)
In Jacob Grimm's parts of the DWB (letters A-C and E-FRUCHT), this type of one line entry occurs 137 times, in all the other parts only five times. Another look at the electronic DWB shows that a total of 4300 Dutch equivalents and examples are given. 3400 of these can be found—again—in the parts written by Jacob Grimm. In contrast, in the D-section, which was prepared by Jacob's brother Wilhelm, we find less than ten Dutch examples. Jacob Grimm is the only lexicographer of the DWB who has given such a large amount of Dutch equivalents. Peculiarities of lexicographers can be traced down with great exactitude; thus some lexicographers extensively quote material from German dialects, others are are not interested in dialects at all. These features of the DWB reflect the heterogeneous methods and attitudes of lexicographers in their time. They can only be revealed by using dictionary data with structural markup that proves to be a reliable basis for metalexicographical research.
REFERENCES Haß-Zumkehr, Ulrike: Deutsche Wörterbücher im Brennpunkt von Sprach- und Kulturgeschichte. Berlin/New York 2001. Schlaefer, Michael: Zur Darstellung wortgeschichtlicher Zusammenhänge des 17.–20. Jahrhunderts in historischen Wörterbüchern. In: Sprachwissenschaft 24.2 (1999), 195–220. Schmidt, Hartmut: Wörterbuchprobleme. Untersuchungen zu konzeptionellen Fragen der historischen Lexikographie, Tübingen 1986. Schulz, Matthias: Der lexikographische Informationsgehalt in älteren Bedeutungswörterbüchern: Zugleich Überlegungen zum Nutzen einer Retrodigitalisierung älterer Wörterbücher. In: Sprachwissenschaft 24.1 (1999), 47–73.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2003
"Web X: A Decade of the World Wide Web"

Hosted at University of Georgia

Athens, Georgia, United States

May 29, 2003 - June 2, 2003

83 works by 132 authors indexed

Affiliations need to be double-checked.

Conference website: http://web.archive.org/web/20071113184133/http://www.english.uga.edu/webx/

Series: ACH/ICCH (23), ALLC/EADH (30), ACH/ALLC (15)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None