E-Lexicography Between Digital Humanities And Artificial Intelligence: Complexities In Data, Technologies, Communities

workshop / tutorial
  1. 1. Tanja Wissik

    OEAW Österreichische Akademie der Wissenschaften / Austrian Academy of Sciences

  2. 2. John P. McCrae

    National University of Ireland, Galway (NUI Galway)

  3. 3. Paul Buitelaar

    National University of Ireland, Galway (NUI Galway)

  4. 4. Toma Tasovac

    Belgrade Center for Digital Humanities

  5. 5. Justin Tonra

    National University of Ireland, Galway (NUI Galway)

  6. 6. Ksenia Zaytseva

    OEAW Österreichische Akademie der Wissenschaften / Austrian Academy of Sciences

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Lexicography is currently embracing rapid change as the traditional methods of publishing dictionaries are replaced by the ubiquity of lexical information on the Web. Furthermore, the application of computational techniques to the processes of lexicography is revolutionizing how dictionaries can be constructed. In this context, the recently established ELEXIS (European Lexicographic Infrastructure) project aims to develop an infrastructure for eLexicography across Europe, that builds a virtuous cycle of lexicography where lexicographic resources are linked across languages, in order to build improved natural language processing tools, which can then aid in the construction of novel resources and retro-digitization of dictionaries, thus driving the cycle. The project is multilingual, covering 15 European countries, and has a strong interest in driving lexicography for under-resourced and minoritized languages.
The event will be the second iteration of a highly successful
workshop first run before the EADH Conference in Galway on 6 December 2018. On that occasion, the Workshop was the highest-subscribed of the ten pre-conference workshops, with 33 regular registrations and approximately fifty attendees from a range of backgrounds in lexicography, computer science, and the humanities.

Specific emphasis in this edition of the workshop will be on complexity in regard of data, technologies and community aspects of lexicography. Complexity in data concerns issues in access to lexicographic data across stakeholders (national institutes, research groups, individuals), representation formats, linguistic assumptions, underlying theories and scope of analysis and representation, legal restrictions and licenses, multilingual, cross-lingual, comparative and typological issues, as well as advanced aspects in multimodality, concerning audio-visual representation. Complexity in technologies concerns challenges in creating and expanding novel lexicographic resources, which require a combination of many technologies, including natural language processing tools as well as machine learning approaches and AI methods in general, for data linking and data management, in order to identify and represent words, their senses and definitions.
Complexity in communities, in part, lies in differences related to stakeholders’ type and status (national language institutes, standardization bodies, research groups, individuals), status of the language in question (official, minority, regional, etc.), and involvement in networking activities. In addition, the complexity is related to several academic disciplines involved in eLexicographic research such as linguistics, natural language processing (NLP), digital humanities, artificial intelligence (AI), computational linguistics, computer science, etc. This constitutes a challenge to provide professionals with training opportunities and ensure knowledge exchange among all stakeholders.
The workshop has three main aims: firstly, we will invite speakers from existing major lexicographic projects to give insights into the complexity in regard of data, technologies and community aspects of lexicography. Secondly, we will provide a hands-on tutorial with the ELEXIS infrastructure to enable participants to become familiar with the technologies being developed in the project. Finally, we will make an open call for posters, which will provide an overview of new projects in the area of electronic lexicography, with a special focus on papers that tackle the topic of complexity. The poster presenters will give a 5 minute lightning talk on their topic in addition to their contribution to the poster session.
Topics of Interest

Lexicography and Digital Humanities
Complexity in Data for Lexicography
Complexity in Technologies for Lexicography
Complexity across Lexicography and Related Communities
Access and usage of dictionaries on the Web
Retro-digitization of lexicographic resources
Lexicography for language learning
Use and applications of NLP for lexicography
Lexicography for under-resourced languages
Lexicography for terminology and translation
Linked Data for lexical resources
AI for Lexicography

Summary of the Call
We welcome submissions of abstracts of up to 500 words that will be presented as posters at the workshop. Submissions should present methodologies, experiments, use cases, descriptions of ongoing or planned research projects and position papers on topics related to the topics of interest (given above). Furthermore, we especially welcome papers describing interdisciplinary research combining research in lexicography, linguistics, computer science and digital humanities approaches and giving insights into complexity in regard of data, technologies and community aspects of lexicography.
Please submit abstracts by May 6th 2019 in any language (including an English translation for the title for reviewing purposes). Submissions will be reviewed by at least 3 reviewers and will be made available online prior to the workshop. Papers should be submitted via EasyChair. Notifications will be sent by end of May and final versions of abstracts will be required by end of June. More information on the workshop can be found on the
workshop website.

Tentative Schedule

Invited talk: “Unity in Variety: Observation and Lexicographic Treatment of the German Standard Variety used in South Tyrol” Andrea Abel (Eurac Research, Italy)

Invited talk:
"Framing in the Dutch Language: from structured data to text and back from text to structured data on situations” Piek Vossen (Vrije Universiteit Amsterdam, Netherlands)

Introduction to ELEXIS infrastructure and technology

Coffee break

Lightning Talks

Poster session

Organizing Committee

John P. McCrae is a lecturer above-the-bar at the National University of Ireland Galway in the school of information technology. His work has focussed on the application of linked data to language resources. In particular, he is the original developer of the lemon-OntoLex model, which has become a de-facto standard for representing lexicons on the Web. In addition, he is a board member of the Global WordNet Association. He has also organized many events (e.g. Language Data and Knowledge Conferences, Linked Data in Linguistics Workshops Summer Datathons/Summer Schools on Linguistic Linked Open Data etc.)
Address: Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
Email: john.mccrae@insight-centre.org

Paul Buitelaar is a Senior Lecturer at the National University of Ireland Galway (NUIG), vice-director of the Insight Centre for Data Analytics at NUIG and head of the Insight Unit for Natural Language Processing. His main research interests are in the development and use of Natural Language Processing methods and solutions for semantic-based information access. He has been involved in a large number of national and international funded projects in this area. In recent years he was involved in the development of the Saffron framework for knowledge extraction and the definition and implementation of lemon, a vocabulary for Linguistic Linked Data.
Address: Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
Email: paul.buitelaar@insight-centre.org

Toma Tasovac is Director of the Belgrade Center for Digital Humanities (BCDH) and Director of the Digital Research Infrastructure for the Arts and Humanities (DARIAH). His areas of interest include lexicography, data modeling, TEI, digital editions and research infrastructures. Toma was previously a Steering Group member of the European Network for eLexicography (ENeL), and is currently also affiliated with the European Lexicographic Infrastructure (ELEXIS).
Address: Belgrade Center for Digital Humanities, Belgrade, Serbia
Email: ttasovac@humanistika.org

Justin Tonra is Lecturer in English (Digital Humanities) at the National University of Ireland Galway. His areas of research interest include digital approaches to literary studies, book history, textual studies and bibliography, scholarly editing, and literature of the Romantic period. He is currently joint National Coordinator for DARIAH Ireland, and a working-group leader for COST Action CA16204 Distant Reading for European Literary History.
Address: English Department, National University of Ireland Galway, Ireland
Email: justin.tonra@nuigalway.ie

Tanja Wissik is a senior researcher at the Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Sciences and teaches at the University of Graz. She holds a PhD from the University of Vienna in translation studies with a specialization in the field of terminology and corpus linguistics. She has been working in numerous research projects related to language resources and language technologies and is involved in outreach and network activities.
Address: Austrian Academy of Sciences, Austria
Email: Tanja.Wissik@oeaw.ac.at

Ksenia Zaytseva is data analyst at the Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Sciences. She is primarily involved in development of tools and services for DH projects in archaeological and linguistic domains. Her main research interests are Semantic Web technologies, Linked (Open) Data, controlled vocabularies and reference data services. She is also interested in scientific Python programming, machine learning and web application development.
Address: Austrian Academy of Sciences, Austria
Email: ksenia.zaytseva@oeaw.ac.at

The workshop is supported by the two EU projects
ELEXIS (European Lexicographic Infrastructure) and
Prêt-à-LLOD (Multilingual Linguistic Linked Data), and

Programme Committee

Fahad Khan (ILC-CNR, Pisa)
Monica Monachini (ILC-CNR, Pisa)*
Rute Costa (New University of Lisbon)
Francesca Frontini (University Paul Valéry, Montpellier)
Andrea Bellandi (ILC-CNR, Pisa)
Christophe Roche (University of Savoie)
Bolette Sandford Pedersen (University of Copenhagen)
Christiane Fellbaum (Princeton University)*
Philipp Cimiano (Bielefeld University)
Simon Krek (Josef Stefan Institute)
Vera Hildenbrandt (Trier Center for Digital Humanities)
Karlheinz Mörth (Austrian Academy of Sciences)*
Thierry Declerck (DFKI)
Katrien Depuydt (Institute of Dutch Language)*
Christian Chiarcos (Goethe-University Frankfurt)
Monika Rind-Pawlowski (Goethe-University Frankfurt)*
Alexander Geyken (Berlin-Brandenburg Academy of Sciences)*

* Awaiting confirmation

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO