From Digitization to Knowledge: Resources and Methods for Semantic Processing of Digital Works/Texts

Pierre Nugues; Lars Borin; Nathalie Fargier; Richard Johansson; Nils Reiter; Sara Tonelli

Authorship

1. Pierre Nugues

Lund University
2. Lars Borin

Göteborg University (Gothenburg)
3. Nathalie Fargier

CNRS (Centre national de la recherche scientifique), Ecole Normale Supérieure de Lyon (ENS de Lyon), Université de Lyon (University of Lyon)
4. Richard Johansson

Göteborg University (Gothenburg)
5. Nils Reiter

Universität Stuttgart
6. Sara Tonelli

Fondazione Bruno Kessler

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Internet is a revolution that will not stop “until everything is digitized”,
Louis Gerstner, former Chairman of IBM, quoted in
the Economist, June 4th 1998

Description
The goal of this workshop is twofold: First, to provide a venue for researchers to describe and discuss practical methods and tools used in the construction of semantically annotated text collections, the raw material necessary to build knowledge-rich applications. We expect such tools to include lexical and semantic resources with a focus on the interlinking of concepts and entities and their integration into corpora.
A second goal is to report on the on-going development of new tools for providing access to the rich information contained in large text collections. Semantic tools and resources, notably, are reaching a quality that makes them fit for building practical applications. They include ontologies, framenets, syntactic parsers, semantic parsers, entity linkers, etc. We are interested in examples of cases that make use of such advanced tools and their evaluation in the field of digital humanities, with a specific interest on multilingual and cross-lingual aspects of semantic processing of text.

Topics of interest

Construction and use of ontologies for text collections
Entity nomenclatures and bridging
Integration of lexical knowledge in text collections
Visualization, user interfaces
Semantic repositories: Entities and propositions
Interlinking of concepts and entities in multilingual text
Representing inter-textual relations
Semantic search and information retrieval
Tools for semantic annotation
Timeline-based approaches such as "culturomics"
Technical infrastructures and standards
Quality evaluation
Applications in digital humanities

Invited speakers
The workshop will include one, possibly two, invited speakers of international reputation.

Motivation
One of the consequences of the digital revolution is the gradual, but inexorable availability of all kinds of text in a machine-readable format. Libraries around the world scan their collections. Newspapers offer their articles on the web. Governments put their archives and laws online. A large part of what the human mind has produced: Literature, essays, encyclopedias, biographies, etc., is, or will be, accessible in a computerized form in a wide variety of languages. Within a few years, we can predict that (nearly) all text ever produced by humanity will be available in digital form: Either born digital or digitized from books, newspapers, archives, etc.
While digitization is well underway, turning the information contained in these texts into exploitable knowledge in the information society has become a major challenge as well as a major opportunity. IBM Watson and Google's knowledge graph are recent and spectacular achievements that show the significance of knowledge extraction from text. IBM Watson is a system that can answer questions in the US Jeopardy quiz show better than any human being. One of its core components is the PRISMATIC knowledge base consisting of one billion semantic propositions extracted from the English version of Wikipedia and the New York Times, while Google’s knowledge graph is based on a systematic extraction of millions of entities from a variety of sources. Such technologies are defining the information age, and they have the potential to bring a much higher degree of sophistication to "distant-reading" methodology in digital humanities, enabling large-scale access to text content.

Audience
The target audience is a mix of users that would like to apply semantic processing techniques to text and researchers in this area. Users, for instance, could be interested in the extraction of entities and their association with encyclopedic text or the extraction of relations from text: date and place of birth/death, profession, etc. Researchers would describe practical techniques and algorithms that could fit the needs of the users.

Organizers

Lars Borin, University of Gothenburg
Nathalie Fargier, Persée (Université de Lyon, ENS de Lyon, CNRS)
Richard Johansson, University of Gothenburg
Pierre Nugues, Lund University
Nils Reiter, Universität Stuttgart
Sara Tonelli, Fondazione Bruno Kessler

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2016

"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website: https://dh2016.adho.org/

Series: ADHO (11)

Organizers: ADHO

From Digitization to Knowledge: Resources and Methods for Semantic Processing of Digital Works/Texts

1. Pierre Nugues

2. Lars Borin

3. Nathalie Fargier

4. Richard Johansson

5. Nils Reiter

6. Sara Tonelli

ADHO - 2016

"Digital Identities: the Past and the Future"