Books into Bytes: The "Deutsches Wörterbuch" on CD-ROM and on the Internet

poster / demo / art installation
Authorship
  1. 1. Ruth Christmann

    Universität Trier

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

I. Starting position, targets:

Jacob and Wilhelm Grimm's "Deutsches Woerterbuch" (DWB) comprises the most extensive documentation of the German language. Its outstanding position is confirmed by its history: for more than one hundred years - the longest period of publication for a German dictionary - generations of lexicographers have contributed about 350,000 entries to the DWB, which is divided into 16 volumes (bound as 32), containing a total of 67,744 columns.

The DWB reflects more than one hundred years of political, cultural, and institutional history. Moreover, it shows the influence of varying preferences of numerous philologists concerning practical lexicography as well as changed insights into philology and linguistics.

Digitizing the DWB not only means preserving the outstanding achievements of German lexicography but also opens up new possibilites in using the rich dictionary material. Since November 1998, a project at the University of Trier has been creating a computerized version of the DWB to be published on CD-ROM and also made available via the Internet. It is intended to provide user-friendly search and display software in order to get optimum opportunities for data retrieval. It will as a result appeal to anyone interested in the German language. In this way, the poor situation in the field of electronic dictionaries of German when compared internationally will be decisively improved.

II. Technical issues:

Taking into account the developments of international standards for text encoding, TEI Guidelines are used for a structured markup of the dictionary. This prepares the way for the production of the CD-ROM version by applying special SGML tools: starting-point for the application of CoST (Copenhagen SGML Tool) is our pool of SGML encoded files which have been created from the TUSTEP (TUebingen System of Text Processing Programs) data and validated by an SGML parser. CoST is a general-purpose SGML post-processing tool. It is a structure-controlled SGML application, that is, it operates on the element-structure information-set (ESIS) representation of SGML documents. CoST provides a flexible set of low-level primitives upon which sophisticated applications can be built. These include a powerful query language for navigating the document tree and extracting ESIS information, an event-driven programming interface, and a specification mechanism which binds properties to nodes based on queries. On the one hand CoST generates a set of HTML pages for displaying the dictionaries by traditional web browsers, on the other hand it transforms the SGML data into command scripts for Tcl/Tk for the graphical user interface of the CD-ROM. Tcl, the Tool Command Language, is a very simple programming language. Tcl provides basic language features such as variables, procedures, and control, and runs on almost any modern OS, such as Unix, Macintosh, and Windows 95/98/NT computers. Tk is a Tcl extension, written in C, designed to give the user a relatively high level interface to their windowing environment. Finally CoST is used to set up a database that contains all the information of the dictionary entries necessary to perform queries about the different components that might be interesting to those concerned with studying the German language from its very beginnings, such as etymology, language, quotations, including of course traditional full-text retrievals. The database is accessible from both platforms: the web browsers connect to it via CGI scripting, and the CD-ROM GUI uses an integrated Tcl interface.

III. Software demonstration:

The software demonstration will present the way in which valuable information can be extracted from an electronic version of the DWB by using full-text retrieval, links, and a database that facilitates complex queries.

The possibility of using different retrieval options enables the user of a CD-ROM or Internet version of the DWB to search for certain phenomena in up to 33 substantial volumes, independently of headwords. The information hidden within the different entries is made even more explicit in various ways: via hyperlinks, the index volume will be connected with the references appearing in the dictionary. The information needed in order to quote from the sources of the DWB can thus be accessed very easily via pop-up windows. As a common feature of electronic dictionaries, it will be possible by a preparation of a list of headwords to look up every one of the headwords just by activating the corresponding links and, moreover, to gain access to certain parts of the longer articles separately by using the original organization of the entries. Specific information as to the grammatical gender of headwords or sublemmata or the occurrence of certain words in quotations from certain authors or literature will be obtainable by a database generated from TEI compliant markup as mentioned before.

One of the major aims of the project is, however, not only to show the user different means of making use of the DWB for lexicographical, historical, or linguistic studies and to present the DWB in an appealing way, but to increase the use of the DWB in general, i.e. as a book in several volumes to be read by those interested in the German language and the history of German words. For such purposes, it is absolutely necessary to allow the user an easy access both to the electronic version of the DWB and to the information stored within the entries, and to the printed version of the DWB as presented on screen via PostScript files. Encoding the entries according to TEI Guidelines is as important as presenting the characters of the different languages exactly as they appear in print.

The procedure of digitizing the DWB is very closely connected to that of one of the other retrodigitization projects at the University of Trier, Digital Middle High German Dictionaries Interlinked. As the Middle High German Dictionaries are quoted very often within the DWB, it is also intended to include links to the electronic version of these dictionaries. Their CD-ROM is meant as a prototype for the retrodigitization of other historical dictionaries, thus the presentation of the Dictionaries Interlinked may conclude the software demonstration to show how the DWB will look in a final state.

At the time of the conference, at least two major volumes of the DWB including the index volume will be fully encoded, converted into a CD-ROM, and accessible for searches. Questions to be discussed when presenting the DWB in its prospective digital version might focus on the application of SGML/TEI to a dictionary as heterogeneously structured as the DWB, on the necessity for developing new entities for character representation, and the importance of a digital DWB for future research.

Literature:

Christmann, Ruth/Hildenbrandt, Vera/Schares, Thomas (forthcoming, 2000): Digitalisierung des Deutschen Woerterbuchs von Jacob und Wilhelm Grimm, Hg. von Nicolas Castrillo Benito et. al. TUSTEP educa. 6. ITUG-Jahrestagung. Burgos.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2000

Hosted at University of Glasgow

Glasgow, Scotland, United Kingdom

July 21, 2000 - July 25, 2000

104 works by 187 authors indexed

Affiliations need to be double-checked.

Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/

Series: ALLC/EADH (27), ACH/ICCH (20), ACH/ALLC (12)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None