Navigating and Processing Data from the TEI with XSLT

Elisa Eileen Beshero-Bondar; Martina Scholger; Kiyonori Nagasaki

Authorship

1. Elisa Eileen Beshero-Bondar

Penn State Erie, The Behrend College, United States of America
2. Martina Scholger

Zentrum für Informationsmodellierung (ZIM) (Center for Information Modelling) - Karl-Franzens Universität Graz (University of Graz)
3. Kiyonori Nagasaki

International Institute for Digital Humanities

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Knowing how to navigate and explore data in your encoding can be an important way to learn how to work with TEI and XML generally. This workshop is designed for people who have some experience with markup languages and seek to learn more about how to work with digital scholarly editions as a basis for analysis and research. We seek to raise awareness of the long-term sustainability of XML and TEI and the tool stack designed to process it. A little working knowledge of the query language XPath and the transformation language XSLT can help reduce reliance on software, packages, and plugins that may become obsolete without warning. Further, XSLT's functional programming can serve as a way of articulating research questions around a document data model expressed in XML.

The emphasis of our workshop is “pull-processing”, that is, extracting data and metadata from markup documents for analysis rather than providing the reading view of a digital scholarly edition. Markup in documents supplies structures and contexts that are particularly useful for processing data beyond what we can do with so-called “plain text”. Our workshop will teach the pull-processing of data from XML/TEI with simple, reusable XSLT templates to represent in TSV/CSV and HTML tables/charts.

We will begin by sharing and reviewing together several markup-based projects that visualize data, and we will first study the encoding of these projects. Featured encoding will include projects developed by the workshop instructors and their students, drawn from student projects in the Newtfire organization (

https://newtfire.org

)

from the

East Asian/Japanese SIG

of the TEI consortium (

https://github.com/TEI-EAJ/

) and the

SAT Daizokyo Database project

(

https://21dzk.l.u-tokyo.ac.jp/SAT/index_en.html

), and projects in the GAMS (

https://gams.uni-graz.at)

repository. A sampling of project markup will provide a basis for us to review XML elements and attributes and give participants an opportunity to refresh their understanding of the XML tree structure.

A particularly interesting challenge of our workshop is to share XML documents written in multiple languages. We will also work with source documents composed in languages unfamiliar to the European and North American members of the instructional team, with guidance from our Japanese colleague in order to show that the code we write is transferable to multiple projects across language and cultural borders. Workshop instructors will collaborate on preparing source materials to establish an international foundation for this workshop. This will help us to explore how XSLT works in an international communication and processing context, as long as we are clear on what we wish to process and the significance of the data.

We will demonstrate some basic navigation and calculation functions with XPath, before proceeding to show how XPath is applied in XSLT templates to address specific nodes that hold data of interest for visualization in statistical processing programs and simple online tools, where the structure of the output data is transferable to many different online calculation programs and amenable to statistical processing. During the workshop we will produce some structured documents: HTML lists and tables as well as plain text tabulated data (CSV or TSV files). The XSLT that we supply and that we write together in the workshop will be carefully documented with an interest to assist participants with revising and adapting the code to their own projects. We also hope to process some participant-supplied XML before, during, and after the workshop.

In a half-day workshop we cannot teach everything that XPath and XSLT can do. However, we can share some commonly used methods to do pull processing (XPath axes, especially descendant::), simple arithmetic (count(), sum(), avg(), multiplication and division operators), output into TSV (CSV) text formats for processing. Making an HTML list and table would be desirable. If we have time, we will model simple transformations of calculated data from XML to SVG to draw a bar or line graph.

Outline

Review and refresh understanding of XML tree structures, working with XML and TEI projects in multiple languages that explore and analyze data pulled from structured documents.
Orientation to XPath, working with source documents provided by the instructors.
Teach basic XSLT to produce simple outputs that curate data and show how these can be used in simple charts and graphs.

Participation and requirements
If you wonder or worry about how to process or share data from your markup, this is a good workshop for you! Familiarity with the TEI in an introductory context is desirable, but participants may also be relatively new to markup technologies. No programming experience is necessary, but some experience with markup is useful. We will provide some introduction to TEI (why TEI is useful for projects that care about data) but we will not have time to do a full introduction of the TEI in the interest of emphasizing simple data processing with existing data. People may find themselves gaining a fresh perspective on the TEI on the basis of what they learn in this workshop.
Our past experience as instructors has shown us that to guarantee a high-quality experience, the instructor team must take great care to monitor and assist our participants. To ensure the instructor team can provide assistance and work with everyone, this workshop is limited to a number of 25 participants. There is no additional charge beyond conference expenses for workshop participants. Instructors will be in contact with participants well in advance of the workshop to provide software installation instructions and helpful resources before we meet, and we will provide means to remain in contact to assist participants after the workshop ends.
Workshop instructors
Elisa Beshero-Bondar, PhD

Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College

An active member of the

Text Encoding Initiative (TEI)
, Dr. Beshero-Bondar has been serving since 2016 on the

TEI Technical Council
, an eleven-member international committee that supervises amendments to the TEI Guidelines. She has been teaching since the 1990s, and began teaching markup languages and XML stack processing almost as soon as she began learning them in the 2010s. Before moving to direct the

DIGIT program
at Penn State Erie, she was Director of Pitt-Greensburg's

Center for the Digital Text
. Her research often involves working with archival letters and manuscripts, and has led her to study what early 19th-century poets and dramatists understood about human physiology and electricity, cultural first contact on Pacific islands, and the mutiny on the HMS Bounty. She is the architect of the

Digital Mitford Project
and other digital research projects involving TEI XML to build editions and prepare structured analyses of variants and collocations in texts. Find her on GitHub at

https://github.com/ebeshero
and her projects on the site named for her pet firebelly newt at

https://newtfire.org
.

Dr. Martina Scholger

Centre for Information Modelling - Austrian Centre for Digital Humanities, University of Graz

Martina Scholger has a PhD in Digital Humanities and holds a Senior Scientist position at the Centre for Information Modelling – Austrian Centre for Digital Humanities at the University of Graz. Her main research field is digital scholarly editing, the application of digital methods and semantic technologies to humanities’ source material, and text mining. In addition to teaching data modelling, text encoding and X-technologies, her work at the centre involves the conceptual design, development and implementation of numerous cooperation projects in the field of digital humanities (see http://gams.uni-graz.at). She has been an elected member of the TEI Technical Council since 2016, where she is currently its chair, and a member of the Institute for Documentology and Scholarly Editing since 2014. She has been teaching at a number of Summer Schools and workshops in the context of digital scholarly editing, e.g. at the Digital Humanities at Oxford Summer School and Schools organised by the Institute for Documentology and Scholarly Editing (IDE).

Kiyonori Nagasaki, Ph.D.

Senior Fellow at the International Institute for Digital Humanities in Tokyo.

His main research interest is in the development of digital frameworks for collaboration in Buddhist studies. He is also engaging in investigation into the significance of digital methodology in Humanities and in promotion of DH activities in Japan. He has been participating in a number of Digital Humanities projects conducted at several institutions in Japan and abroad such as the University of Tokyo, Kyoto University, Osaka University, the National Diet Library, the National Museum of Ethnology, the National Institute of Japanese Language and Linguistics, the University of Tsukuba and the University of Hamburg. His activities also include postgraduate education in DH at the University of Tokyo as well as administrative tasks at several scholarly societies including Japanese Association for Digital Humanities, and the Japanese Association of Indian and Buddhist Studies.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022

"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO

Navigating and Processing Data from the TEI with XSLT

1. Elisa Eileen Beshero-Bondar

2. Martina Scholger

3. Kiyonori Nagasaki

ADHO - 2022

"Responding to Asian Diversity"