I. Topic:

Since September 1997, a small team of lexicographers and computer scientists at the University of Trier (Germany) have been developing an integrated electronic dictionary of Middle High German applying TEI Guidelines. The resulting integrated digital dictionary is expected to be finished by August 2000 and be published on CD-ROM as well as on the Internet. It is not only meant to facilitate the simultaneous use of the dictionaries concerned, but also to offer advanced query options to provide essentially new insights for those involved in vocabulary studies, metalexicography, and the composition of a new MHG dictionary.
II. Digitization as a necessity:

The most important dictionaries of the MHG language were written in the last century and need to be replaced urgently by a new major work. This necessity arises not only from the enormous increase in the number of editions of MHG texts since the end of the 19th century, but also from changed insights into the structure of vocabulary and new ways of describing word usage. Consequently five years ago, two teams of lexicographers at the Universities of Trier and Goettingen started to lay the foundations for a new MHG Dictionary by creating an electronic archive of texts and quotations. It will probably take up to 25 years, however, for the whole dictionary to be finished, thus scholars of all disciplines having to deal with MHG sources will still have to use the older dictionaries for quite a while.

The dictionaries that already exist, i. e. the "Mittelhochdeutsches Woerterbuch" by Georg Friedrich Benecke/Wilhelm Mueller/Friedrich Zarncke (1854-1866), the "Mittelhochdeutsches Handwoerterbuch" with its supplement, the "Nachtraege", by Matthias Lexer (1872-1878), and the "Findebuch zum mittelhochdeutschen Wortschatz" by Kurt Gaertner et al. (1992), are very closely interconnected and can only be used simultaneously, which is due to the fact that they must be considered, briefly speaking, as a kind of series of supplements to supplements to supplements. Therefore they were ideal candidates for the composition of an integrated digital dictionary. One of the major aims of the digitization is to make the lexicographical information of the dictionary entries accessible via a database and thus to enable sophisticated searches over all four dictionaries independently of headwords. Applying TEI Guidelines to machine readable versions of the printed dictionaries seemed the easiest and fastest way of creating the digital "compound dictionary".

III. (Semi-)Automatically generated markup according to TEI Guidelines:

The MHG dictionaries consist of eight volumes with about 1,100 printed pages, containing more than 80,000 headwords. Therefore it is obvious that TEI compliant markup of the dictionary entries had to be generated automatically as far as possible. For the purposes of encoding we used TUSTEP, the Tuebingen System of Text Processing Programs with its variety of parameter-controlled functions for user-defined textdata-processing that facilitate structured entry-input.

Some parts of the TEI design scheme were especially relevant for the dictionary encoding. Some advantages and problems when applying TEI have to be discussed in detail, such as the hierarchical embedding of elements within the articles, the use of global attributes for the markup of a wide range of lexicographical information, and the recoverability of articles. It should also be mentioned that TEI Guidelines should be improved with regard to the encoding of dictionaries of older stages of a language, for the description of such languages poses some problems seldom encountered when describing modern languages.

It is apparent, however, that most problems which arose when using TEI did not stem from the application of TEI Guidelines as such, but were primarily due to the fact that the dictionary entries often appeared to lack clear structure and were rather discursive in style. This has often made automatic SGML encoding a difficult task. In many cases only manual markup led to TEI compliant documents. Nevertheless, the results achieved so far fully justify the decision in favour of TEI Guidelines.

IV. New ways of using dictionaries:

Through the electronic version, the MHG dictionaries can be used much more easily and comfortably: hyperlinks connect all the corresponding headwords, the search for cross-references only takes a mouse-click's time; pop-up menus contain the relevant information about all sources of citation; bookmarks and notes can be created easily. PostScript files of all dictionary pages are interlinked with the electronic articles so that the compound digital dictionary can be used and cited as a work of reference in exactly the same way as its printed precursors.

Far more important is the access to a database containing the relevant information for the entire contents of the four dictionaries within the composite whole. Access via that database not only offers full-text retrieval but also retrieval of selected information, e.g. of parts of speech, of word forms in MHG quotations, of definitions or of strings in the etymology sections of dictionary entries. Highly important for advanced and complex query options is the linking of a list of all dictionary sources with the electronic dictionary itself: all sources have been sophisticatedly classified according to geographical provenance, chronology and genre, categories that can be used to limit data base queries to a small, self-defined corpus of texts cited within the entries. Which words were directly borrowed from Italian, but not through Latin or French? Which words are only quoted from sources concerning legal issues? Which MHG words denote the same concepts? These are some of the questions that can now be answered without great expense of time. What is still more, the integrated electronic dictionary is especially important for the lexicographers involved in the creation of the new MHG dictionary where the older dictionaries are used as pointers to words for which references rarely exist.

V. Institutional frame:

Some years ago, the Deutsche Forschungsgemeinschaft (DFG = German Research Council) initiated a program for the so-called "Retrospective Digitization of Library Materials". The main goal of the program is to facilitate the access to library holdings that may be rare or highly important for scholarly interests by providing electronic versions of these holdings. From the beginning, the program encouraged the use of SGML for full-text encoding.

Since September 1997, the DFG has been funding the creation of an integrated digital dictionary of Middle High German to be published on CD-ROM as well as on the Internet. It is intended to serve as a prototype for the digitization of other historical dictionaries, including the digitization of the famous "Deutsches Woerterbuch" of Jacob and Wilhelm Grimm


Further information on the topic proposed is available at this location

