Workshop: From Manuscript to Text Analytics

workshop / tutorial
Authorship
  1. 1. Cornelis van Lit

    Utrecht University

  2. 2. Wido van Peursen

    Vrije Universiteit (VU) Amsterdam (Free University)

  3. 3. Mladen Popović

    Rijksuniversiteit Groningen (University of Groningen)

  4. 4. Pierre Van Hecke

    Katholieke Universiteit (KU) Leuven (Catholic University of Louvain)

  5. 5. Dirk Roorda

    Royal Netherlands Academy of Arts and Sciences (KNAW)

  6. 6. Hannes Vlaardingerbroek

    Vrije Universiteit (VU) Amsterdam (Free University)

  7. 7. Maruf Dhali

    Rijksuniversiteit Groningen (University of Groningen)

  8. 8. Mathias Coeckelbergs

    Katholieke Universiteit (KU) Leuven (Catholic University of Louvain)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


In one day, we take participants through the entire workflow from having real manuscripts in your hands to performing complicated database computations on the texts these manuscripts contain. Examples are drawn from ancient Oriental manuscript cultures, because the specific complexities in these resources highlight the strength of applying new computer technologies. The focus of the workshop is on the underlying questions applying to the use of digital techniques in the study of material culture, languages, texts and literature. Therefore, the workshop is aimed at anyone dealing with manuscripts and texts, from Akkadian cuneiform economic texts to manuscripts of Classical Greek and Latin authors, and from Hebrew and Aramaic Dead Sea Scrolls to Middle Dutch devotional literature.

The workshop will provide a bird’s eye view on the entire workflow that starts from the concrete physical carriers of text, through methods of character recognition, grammatical parsing, syntactic annotation, up to advanced methods of text-analytics e.g. topic modelling, linked data, and stylometry.

The workshop is divided into four stages, separated by breaks. Because of the hands-on component of the workshop we can accommodate up to 30 participants.

We will only require standard technical support (e.g. projector) and participants need to bring their own laptops with them. Instructions for the hands-on parts will be given at the beginning of the workshop (no pre-workshop installations is required).

Approximate Schedule:
09.00-09.30 Introduction, aim and agenda of the workshop

Stage 1: From physical manuscript to digital manuscript

09.30.10.00 Explanation on the variety of digitization technologies
10.00-10.30 Practicum on evaluating digitized manuscripts
10.30-11.00 Break

Stage 2: From digital manuscript to text extraction

11.00-11.30 Explanation on possibilities for Hand Writing Recognition
11.30-12.00 Explanation on pattern recognition and deep learning
12.00.12.30 Practicum on pattern recognition
12.30-13.30 Lunch

Stage 3: From text extraction to database

13.30-14.00 Explanation on preparing texts in a uniform manner
14.00-14.30 Practicum on preparing texts according to a schema
14.30-14.45 Break

Stage 4: From database to text analysis

14.45-15.15 Explanation on text analysis useful for texts from manuscripts
15.15-15.45 Practicum on applying automated text analysis
15.45-16.00 Conclusion; sharing of contact information and planning for future events

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.