A User-Centered Digital Edition of Vuk Stefanović Karadžić's Lexicon Serbico-Germanico-Latinum

  1. 1. Toma Tasovac

    Belgrade Center for Digital Humanities

  2. 2. Natalia Ermolaev

    Belgrade Center for Digital Humanities

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

A User-Centered Digital Edition of Vuk Stefanović Karadžić's Lexicon Serbico-Germanico-Latinum
Tasovac, Toma , Center for Digtial Humanities (Belgrade, Serbia) , ttasovac@humanistika.org
Ermolaev, Natalia , Center for Digtial Humanities (Belgrade, Serbia) , ne99@columbia.edu
Dictionaries lie at the core of the human ability to conceptualize, systematize and convey meaning. But a dictionary (both print and digital) is many things at once: a text, a tool, a model of language, and a cultural object deeply embedded in the historical moment of its production (Tasovac, 2010). While it is true that we now live in the age of the electronic dictionary (de Schryver, 2003), dictionaries have always played an important role in the interplay between production technology and knowledge taxonomies (McArthur, 1986; Hüllen and Schulze, 1988; Hüllen, 1999). In this respect, historical dictionaries remain particularly valuable objects of study because they illustrate sociolinguistic perceptions and reveal culturally shaded conceptualizations of lexical knowledge of a particular epoch — often in stark contrast to our contemporary attitudes and values. Moreover, they pose a veritable challange for text encoding, semantic markup and database modeling (Fomin and Toner, 2006; Nyhan, 2006; Nyhan, 2008; Mooijaart and van der Wal, 2009; Lemnitzer et al., 2010). This is why all dictionaries, including retrodigitized historical dictionaries, are important for digital humanities, and why DH — with its concern for (abstract) modeling of knowledge and its (practical) implementations in humanities research — can integrate and propel different trains of lexicographic and metalexicographic thought at the intersection of language and technology.

Many DH research projects have aimed to produce electronic editions of printed lexicons (see for instance Morrissey, 1993; Lemberg et al., 1998; Christmann, 2001; Fournier, 2001). In such efforts, retrodigitization is usually based on one of two approaches: either the production of “faithful“ digital copies (at the cost of reproducing factual or typographic errors), or the structural modelling of the content, which treats the print edition as a data source, rather than as an immutable text to be reproduced in its entirety (Lobenstein-Reichmann, 2008). In either case, retrodigitization projects tend not to involve any degree of re-editing or expanding the actual content of historical dictionaries.

We agree with Kirkness (2008) that digitalizing historical dictionaries can increase and optimize their use value, especially in global, networked environments. But we also feel that one central aspect is often overlooked in current studies of retrodigitized dictionaries: users interacting with a historic lexicon do no longer necessarily have active command of the text's primary language. Even when historical dictionaries are retrodigitized with the user's needs in mind, the focus is usually on easy-to-handle navigation, presentation layout and retrieval of elements from a full-text search (Christmann, 2003); or on uniformization of existing data elements, such as dates (Kinable, 2006). While these efforts are worthwhile and necessary because they contribute greatly to editions that are more usable and efficient than their hardcopy counterparts, electronic dictionaries remain in essence lookup tools (for words encountered in a given text) rather than exploratory tools (for unknown words or concepts). This, we believe, can reduce both their scholarly and popular appeal.

It may seem unlikely at first that historical dictionaries can generate non-academic interest, but experience has shown that there is a broad audience outside highly professionalized linguistic circles that is both curious and enthusiastic about exploring the historic and ethnographic fabric of a language (Kirkness, 2008). In our own web-project — “Reklakaza.la” (Serbian for “hearsay”) — we have been publishing online selected entries from the classic, 19th-century Serbian Dictionary by Vuk Stefanović Karadžić and linking them via social networks Twitter (http://twitter.com/ Vuk_Karadzic) and Facebook (http://facebook.com/reklakaza.la). The project has gained more than 24,000 fans on Facebook alone, becoming a platform for bringing meaningful humanities inquiry into the public conversation, fostering the sense of community, sharing, and mutual learning that proves the relevance of the humanities in today's world despite academic budget cuts and declining job opportunities.

The success of our pilot project has strengthened our conviction that a modern, electronic edition of Karadžić's dictionary is long overdue. Vuk Stefanović Karadžić (1787-1864), the linguist, folklorist and reformer of the Serbian language, published his landmark Srpski rječnik, istumačen njemačkijem i latinskijem riječima — Lexicon Serbico-Germanico-Latinum in two editions (1818 and 1852). This first lexicon of the modern Serbian vernacular, rather than the Church-Slavic hybrid language used by the educated elites up to the 19th century, has a unique place in the history of not only the Serbian language, but the South Slavic diasystem in general (Дмитриев and Сафронов, 1984; Wilson, 1986; Стојановић, 1987; Eschker, 1988; Potthoff, 1990; Ивић, 1990; Vitalich, 2005; Кулаковский, 2005). The text is rich with ethnographic and anthropological material. Not only do many entries contain examples of Balkan folk storytelling, but some are themselves structured as historical, cultural and ethnographic narratives that offer informative sketches and sometimes even very detailed accounts of the myths and realities of the Balkan past (see, for instance, entries for кмет, отмица, мора, хајдук, etc.).

Though it was republished twice (in 1898 and 1935), the Lexicon has not been reprinted since (other than in facsimile editions). Meeting the needs of modern-day users, however, presents a host of editorial challenges. The entries in the Lexicon are written mainly in a dialect which is on the margins of contemporary standard Serbian. Thus, the lexicographic material is not always entirely understood by contemporary speakers, and can often appear obscure or unwieldy. It is hard for the average user to answer questions such as: what was the early 19th-century equivalent of a mod-ern-day Serbian word? What household objects, for instance, are listed in Karadžić‘s dictionary? What words were difficult or impossible for Karadžić to translate into German or Latin?

Our “Annotated Digital Edition of Vuk Stefanović Karadžić’s Srpski rječnik” is therefore conceived as a resource that caters to access needs and habits of modern scholars, teachers, students, and, last but not least, general readers. The entries are marked up XML, in compliance with the Guidelines of the Text Encoding Initiative (Burnard et al., 2006). In this, initial phase of the project, we are focusing solely on text encoding, but in view of the potential use in a data-base driven web-application at a later stage.

In addition to marking up existing structural elements of a dictionary entry (such as lemma, part of speech, senses, definitions, translation equivalents, examples etc.), our work supplies important additional information that will enhance the modern-day user's interaction with the dictionary, including:

standard Serbian equivalents to dialect word forms (e.g. бичевање vs. бичкарење, мешина vs. мљешина, енглески vs. инглешки);
Serbian ekavian word-forms to both standard and east-Herzegovina jekavian entries (e.g. терати vs. тјерати, терати vs. ћерати);
both the original 19th-century accentuation (e.g. кòчија̑шки) and its modern-day graphic equivalent (кòчија̄шкӣ);
indications when modern-day accentuation differs from the from found in the Lexicon (e.g. мо̑ре vs. мо̏ре, das Meer, mare);
an extension of the extant cross-reference system through linking synonymous and near-synonymous entries that have been overlooked by previous editors (e.g. жаба and напнигуша; обрљуга and неопера).
labeling of Turkisms overlooked by previous editors (e.g. була, инћар, џукела);
marking up persons, places and dates for easy indexing and analysis;
indications of word usage (eg. ист. and ист. кр. as <usage type="geo">East</usg> for better statistical analysis and possible further processing and creation of geo-spatial word maps etc.);
marking up instances where Karadžić uses a first-person narrative to explain an entry;
indications of the edition in which entries appeared for the first time, etc.
Furthermore, we are assigning semantic domain labels to word senses in accordance with Magnini and Cavaglia, 2000; Bentivogli et al., 2004, cross-referencing senses with the Transpoetika Dictionary — a bilingualized, Wordnet-based Serbian-English dictionary (Tasovac, 2009), and providing English glosses in addition to the existing German and Latin. All of this will help us meet our goal of moving beyond the current paradigm of limiting retrodigital text editing to the creation of electronic replicas of hardcopy lexicons or semantically structured electronic representations of the original data source. We are interested in hybrid approaches that respect the integrity of the original text, but also take advantage of the digital medium to create modern, deeply-encoded, user-centered editions of historical dictionaries, which can not only provide look-up mechanisms for particular words, but also function as exploratory tools for various types of knowledge discovery.

Some practical advantages of our edition of Vuk Karadžić's dictionary will include reverse look-ups, allowing a user to search an English, German, or Latin word and find its Serbian equivalent in the Lexicon. The domain labels will provide researchers with valuable and, for the first time, measurable information about the clusters of paradigmatically related terms, as well as the extent of domain ambiguity and domain variability. Users will be able to treat semantic domains as thematic entry points into the dictionary, looking up, for instance, all entries that belong to AGRICULTURE, FOLKLORE, HISTORY or GASTRONOMY; while our logically and semantically consistent markup of Karadžić's own usage notes will make it possible for users to easily explore regional and dialectological distribution of entries in the Lexicon, offering a basis for subsequent work that could involve data visualization, statistical analysis, text mining etc.

Bentivogli, L P Forner B Magnini E Pianta 2004 “Revising the Wordnet Domains Hierarchy: Semantics, Coverage and Balancing, ” Proceedings of the Workshop on Multilingual Linguistic Resources, 101-108

Burnard, Lou Katherine O’Brien John Unsworth 2006 Electronic Textual Editing, New York The Modern Language Association of America

Christmann, R 2001 “Books into bytes: Jacob and Wilhelm Grimm’s Deutsches Worterbuch on CDROM and on the Internet, ” Literary and Linguistic Computing, 16 no. 2 121-133

Christmann, R2003 “Towards the User: The Digital Edition of the Deutsche Worterbuch by Jacob and Wilhelm Grimm, ” Literary and linguistic computing, 181: 11-22.

de Schryver, Gilles-Maurice2003 “Lexicographer’s Dreams in the Electronic-Dictionary Age, ” . International Journal of Lexicography, 162: 143-199

Eschker, Wolfgang, ed. 1988Jacob Grimm und Vuk Karadžić: Zeugnisse einer Gelehrtenfreundschaft, Volkskundliche Schriften, Bd. 4 KasselE. Röth-Verlag

Fomin, Maxim Gregory Toner. 2006 “Digitizing a Dictionary of Medieval Irish: the eDIL Project, ” Literary and Linguistic Computing, 21 1 83

Fournier, Johannes 2001. “New directions in Middle High German lexicography: dictionaries interlinked electronically, ” Literary and Linguistic Computing, 16 no. 1 99-111

Hüllen, Werner 1999 English Dictionaries, 800-1700: The Topical Tradition, Oxford [England] New York Clarendon Press Oxford University Press

Hüllen, Werner Rainer Schulze. 1988 Understanding the Lexicon: Meaning, Sense, and World Knowledge In Lexical Semantics, Tübingen M. Niemeyer

Kinable, Dirk 2006. “Computerized Restoration of Historical Dictionaries: Uniformization and Date-assigning in Dictionary Quotations of the Woordenboek der Nederlandsche Taal, ” Literary and Linguistic Computing 21 no. 3 295-310

Kirkness, Alan 2008 “Digitalisierung -Vernetzung -Europäisierung: Zur Zukunft der historischen Lexikographie des Deutschen, ” Lexicographica, 2007 7-38

Lemberg, I S Petzold H Speer 1998 “Der Weg des Deutschen Rechtswörterbuchs in das Internet, ” Wörterbücher in der Diskussion III. Vorträge aus dem Heidelberger Lexikographischen Kolloquium,, Tübingen Niemeyer 262-284

Lemnitzer, Lothar Laurent Romary Andreas Witt 2010 “Representing Human and Machine Dictionaries in Markup languages (SGML, XML), ” (link)

Lobenstein-Reichmann, Anja 2008 “Allgemeine Überlegungen zur Retrodigitalisierung historischer Wörterbücher des Deutschen, ” Lexicographica, 2007 173-198

Magnini, B G Cavaglia. 2000 “Integrating Subject Field Codes into WordNet, ” Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, 1413-1418

McArthur, T. B. 1986 “Thematic Lexicography, ” The History of Lexicography: Papers from the Dictionary Research Centre Seminar at Exeter, March 1986, R. R. K Hartmann 40 Amsterdam; Philadelphia J. Benjamins

Mooijaart, Marijke Marijke van der Wal 2009 Yesterday’s Words: Contemporary, Current and Future Lexicography, Historiographia Linguistica, Newcastle Cambridge Scholars Publishing

Morrissey, Robert 1993 “Texts and Contexts: The ARTFL Database in French Studies, ” Profession, 27-33

Nyhan, Julianne 2006. “The Application of XML to the historical lexicography of Old, Middle, and Early-Modern Irish: a Lexicon based analysis, ” National University of Ireland Cork

Nyhan, Julianne2008 “Developing Integrated Editions of Minority Language Dictionaries: The Irish Example, ” Literary and Linguistic Computing, 231: 3-12

Potthoff, Wilfried 1990Vuk Karadžić im europäischen Kontext, Beiträge des internationalen wissenschaftlichen Symposiums der Vuk-Karadzic-Jacob-Grimm-Gesellschaft am 19. und 20. November 1987, HeidelbergCarl Winter Universitätsverlag

Tasovac, Toma 2010 “Reimagining the Dictionary, or Why Lexicography Needs Digital Humanities, ” Digital Humanities 2010, (link)

Tasovac, Toma 2009 More or Less Than a Dictionary? Wordnet as a Model for Serbian L2 Lexicography Infotheca: Journal of Informatics and Librarinaship, 10 no. 1-2 13a-22a

Vitalich, Kristin Leigh 2005 “Lexicographical doxa: The writing of Slavic dictionaries in the nineteenth century (Samuel Bogumil Linde, Vuk Stefanović Karadžić, Vladimir Ivanovich Dal), ” University of California at Los Angeles

Wilson, D1986The life and times of Vuk Stefanović Karadžić, 1787-1864: Literacy, Literature, and National Independence in Serbia, Oxford Clarendon Press

Дмитриев, Петр Андреевич Герман Иванович Сафронов1984Вук С. Караджич и его реформа сербскохорватского/хорватосербского литературного языка. Учеб. пособие, ЛенинградЛГУ

Ивић, Милка 1990О jезику Вуковом и вуковском, Нови СадКњиж. заjедница Новог Сада

Кулаковский, Платон Андреевич 2005Вук Караджич. Его деятельность и значение в сербской литературе, МоскваУРСС

Стојановић, Љубомир1987Живот и рад Вука Стеф. Караџића, Београд Београдски издавачко-графички завод

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None