Instituut voor Nederlandse Lexicologie
Dutch language has been described extensively in the comprehensive historical dictionaries of the Institute for Dutch lexicology. These dictionaries (Oudernederlands Woordenboek, Dictionary of Old Dutch, ca. 500-1200; Vroegmiddelnederlands Woordenboek, Dictionary of Early Middle Dutch, 1200-1300 ; Middelnederlandsch Woordenboek; MNW, Dictionary of Middle Dutch, ~1250-550; Woordenboek der Nederlandsche Dictionary of the Dutch Language, 1500-976) cover over 15 centuries of Dutch and are as such a perfect guide to understanding historical language. The dictionaries also provide the core material for the diachronic computational lexicon of Dutch (GiGaNT), that can be used to support search in historical texts by users without (expert) knowledge of historical spelling variation: when searching for
slager (‘butcher’) the user also gets the morphological and spelling variants like
slagers, slagher(s), slaeger(s) slaegher(s) or
slegher(s). However, when a user wants to study the history of the butcher’s trade, it is not immediately obvious from the way these traditional dictionaries are structured that one has also to look for
vleeschhouwer or
beenhouwer or
beenhakker. And it is only after reading the complete articles that a user learns that
vleeschouwer can also mean ‘executioner’, and
slager ‘a person who slays so.’, be it though that in the case of
vleeschhouwer the meaning
‘executioner´ is derived from vleeschhouwer ‘butcher’, while
slager in contemporary meaning ‘butcher’ is derived from the meaning
‘a person who slays so’.
In this contribution we describe the first results of our work on the development of a diachronic semantic lexicon of Dutch. The lexicon aims to enhance text accessibility and to foster research in the development of concepts, by interrelating attested word forms and semantic units (concepts), and tracing semantic developments in time. In the lexicon, the diachronic onomasiology, i.e. the change in naming of concepts and the diachronic semasiology, i.e. the change in meaning of words, will be recorded in a way suitable for use by humans and computers. The onomasiological part of the lexicon is meant to enhance recall in text retrieval by providing different verbal expressions of a concept or related concepts (slager → beenhouwer, beenhakker, vleeshouwer; boer → landman). The diachronic semasiological component (which charts semantic change), aims to enhance precision by enabling the user to take semantic change into account; the oldest meaning of
appel for example is ‘a fruit’ (so
appel is also used for pears, plums etc.).
We describe the structure of the diachronic semantic lexicon and procedures for the acquisition and aggregation of content. The INL historical dictionaries will be the main source of the lexicon, as these dictionaries describe the Dutch lexicon from the 6
th to the 20
th century and cover most of the basic vocabulary of this period. Word sense descriptions are illustrated by dated quotations, which constitute a first step towards dating a concept. The temporal distribution of quotations pertaining to different senses gives a first picture of the diachronic development of the sense inventory of a headword. The fact that many words in the historical dictionaries are defined (partly) by synonym definitions and contemporary semantic (near)-equivalents enables us to extract an initial set of semantic relations.
Information from other sources is not disregarded. For contemporary Dutch, several lexical resources cataloguing semantic relationships are available. This includes traditional synonym dictionaries like Brouwers “Het Juiste woord” and more recent initiatives such as Open Dutch Wordnet (Vossen). For some specific domains, thesauri with a diachronic component are in development (eg. the
HISCO (
http://historyofwork.iisg.nl/index.php)).
Besides lexical sources, diachronic corpus material
Corpora: DBNL (digital library of Dutch literature,
http://www.dbnl.nl), digitized newspaper collections at the Dutch Royal Library, and other collections digitized by the Royal Library (
http://www.delpher.nl).
and corpus-based methods are no less essential to the development and verification of the relevance of the lexicon content. This includes: i) corpus based analysis of semantic change at the “type”-level, using distributional methods. Here, the fact that our starting point is defined by the set of quotation dates per word sense provides an interesting perspective. ii) research into the application of token-based distributional methods to the interlinking of historical corpora and lexical resources.
Bibliography
Fellbaum, C. ed. (1999).
WORDNET. An Electronic Lexical Database. London: The MIT Press.
Geeraerts, D., et al. (1994).
The Structure of Lexical Variation. Meaning, Naming, and Context. Berlin/New York: Mouton de Gruyter.
Geeraerts, D. (1997).
Diachronic Prototype Semantics. A Contribution to Historical Lexicology. Oxford: Clarendon Press.
Geeraerts, D. (2010).
Theories of Lexical Semantics. Oxford/New York: Oxford University Press.
Gulordava, K. and Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus.
Proceedings of the EMNLP 2011 Geometrical Models for Natural Language Semantics (GEMS 2011) Workshop, pp. 67-71.
Heylen, K., et al. (2015). Monitoring polysemy: Word space models as a tool for large-scale lexical semantic analysis.
Lingua, 157: 153-72.
Kay, C. J. and Chase, T. J. P. (1987). Constructing a Thesaurus database.
Literary and Linguistic computing, 2(3): 161-63.
Laurence, S. and Margolis, E. (1999). Concepts and Cognitive Science. In Margolis, E. and Laurence, S.,
Concepts. Core Readings. Cambridge (US)/London: The MIT Press, pp. 3-81.
Sijs, N. van der (2001).
Etymologie in het digitale tijdperk. Een chronologisch woordenboek als praktijkvoorbeeld. Ph.D. thesis, Universiteit Leiden.
Vanhove, M. ed. (2008).
From Polysemy to Semantic Change. Towards a typology of lexical semantic associations. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Vossen, P. ed. (1998).
EuroWordNet: A mulitlingual database with lexical semantic networks. Reprinted from
Computer and the Humanities, Vol. 32, Nos. 2-3, 1998. Dordrecht/Boston/London: Kluwer Academic Publishers.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Jagiellonian University, Pedagogical University of Krakow
Kraków, Poland
July 11, 2016 - July 16, 2016
454 works by 1072 authors indexed
Conference website: https://dh2016.adho.org/
Series: ADHO (11)
Organizers: ADHO