Workshop: From Manuscript to Text Analytics

Cornelis van Lit; Wido van Peursen; Mladen Popović; Pierre Van Hecke; Dirk Roorda; Hannes Vlaardingerbroek; Maruf Dhali; Mathias Coeckelbergs

Authorship

1. Cornelis van Lit

Utrecht University
2. Wido van Peursen

Vrije Universiteit (VU) Amsterdam (Free University)
3. Mladen Popović

Rijksuniversiteit Groningen (University of Groningen)
4. Pierre Van Hecke

Katholieke Universiteit (KU) Leuven (Catholic University of Louvain)
5. Dirk Roorda

Royal Netherlands Academy of Arts and Sciences (KNAW)
6. Hannes Vlaardingerbroek

Vrije Universiteit (VU) Amsterdam (Free University)
7. Maruf Dhali

Rijksuniversiteit Groningen (University of Groningen)
8. Mathias Coeckelbergs

Katholieke Universiteit (KU) Leuven (Catholic University of Louvain)

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In one day, we take participants through the entire workflow from having real manuscripts in your hands to performing complicated database computations on the texts these manuscripts contain. Examples are drawn from ancient Oriental manuscript cultures, because the specific complexities in these resources highlight the strength of applying new computer technologies. The focus of the workshop is on the underlying questions applying to the use of digital techniques in the study of material culture, languages, texts and literature. Therefore, the workshop is aimed at anyone dealing with manuscripts and texts, from Akkadian cuneiform economic texts to manuscripts of Classical Greek and Latin authors, and from Hebrew and Aramaic Dead Sea Scrolls to Middle Dutch devotional literature.

The workshop will provide a bird’s eye view on the entire workflow that starts from the concrete physical carriers of text, through methods of character recognition, grammatical parsing, syntactic annotation, up to advanced methods of text-analytics e.g. topic modelling, linked data, and stylometry.

The workshop is divided into four stages, separated by breaks. Because of the hands-on component of the workshop we can accommodate up to 30 participants.

We will only require standard technical support (e.g. projector) and participants need to bring their own laptops with them. Instructions for the hands-on parts will be given at the beginning of the workshop (no pre-workshop installations is required).

Approximate Schedule:
09.00-09.30 Introduction, aim and agenda of the workshop

Stage 1: From physical manuscript to digital manuscript

09.30.10.00 Explanation on the variety of digitization technologies
10.00-10.30 Practicum on evaluating digitized manuscripts
10.30-11.00 Break

Stage 2: From digital manuscript to text extraction

11.00-11.30 Explanation on possibilities for Hand Writing Recognition
11.30-12.00 Explanation on pattern recognition and deep learning
12.00.12.30 Practicum on pattern recognition
12.30-13.30 Lunch

Stage 3: From text extraction to database

13.30-14.00 Explanation on preparing texts in a uniform manner
14.00-14.30 Practicum on preparing texts according to a schema
14.30-14.45 Break

Stage 4: From database to text analysis

14.45-15.15 Explanation on text analysis useful for texts from manuscripts
15.15-15.45 Practicum on applying automated text analysis
15.45-16.00 Conclusion; sharing of contact information and planning for future events

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html

References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html

Series: ADHO (14)

Organizers: ADHO