Hyper Audio Linking – Generating Hybrids of Text and Video Content for Digital Publishing

workshop / tutorial
  1. 1. Agnes Blaha

    Transforming Freedom

  2. 2. Andreas Leo Findeisen

    Transforming Freedom

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Digital Humanities as a field of research and publication face both new opportunities and challenges due to the increasing number of media sources relevant to researchers and students, no matter their age and origin. Aggregator platforms like YouTube or Vimeo let formerly unknown materials surface that are relevant for new research in the humanities. Meanwhile, large public and private collections held by historical organizations, broadcasting corporations and NGOs, or individuals like artists, scientists or politicians remain to be indexed, investigated and re-published (Sommer, 2016). At the same time, researchers who are experimenting with new, creative ways to combine curating, scholarship and presentation draw attention to the immersive and narrative potential of sound-based media (see e.g. Barber, 2017; Murray and Wiercinski, 2014; Cohen, Rakerd, Rehberger & Boyd 2012).
Some of the use cases relevant to the field of digital humanities that will probably become even more frequent over the next years include:

Support for oral history researchers who need to transcribe large amounts of recordings
Using transcripts for lectures and online learning tools, both for students and the public at large
Enhancing the usability of digital A/V archives via full text search
Facilitating community contribution and citizen science projects
Furthering the accessibility of A/V content (e.g. for deaf people)
Intertwining multimedia material and text for research papers published in online journals

Alternating between more theory-focused discussion and hands-on experience, we will different methods to integrate transcripts into multimedia formats that are useable for research and publications. After a brief introduction into the principles and current possibilities as well as the limitations of computerized speech to text transcription, particularly in comparison with tools for manual transcription such as WebAnno and OCTRA, we will introduce participants to a small number of different speech-to-text software packages that can be used to semi-automatically transcribe A/V content, while discussing their respective pros and cons.
We will then demonstrate how to use the open source package ffmpeg to extract and convert various types of A/V recordings to formats suitable for further processing and try out automatic speech to text conversion with OH portal, an online transcription tool developed and maintained by the phonetics research team at the university of Munich (
https://www.phonetik.uni-muenchen.de/apps/oh-portal/) and Watson, a powerful, trainable natural language processing tool developed by IBM.

For many in the DH community, creating a good transcript is just the start. Presenting research to fellow academics and the public at large, garnering interest for archives and projects, creating lively and easy to use learning material - all while preserving the non-verbal aspects of raw sources - are often just as important. This is probably even more true for novel, hybrid formats which have been theorized to be result of an amalgamation of analog and digital publishing (Ludovico, 2012).
As an exemplary solution, we will introduce Hyper Audio Linking, a technique for the presentation of digital content that allows to link transcripts to pre-defined jump marks in video or audio recordings via JavaScript. HAL augmentation deals with diverse content aspects of a source while letting the original untouched. It can be used for all kinds of sources, be it contemporary recordings of lectures and panel discussions, interviews, or historical footage, as well as some art genres like theatre plays, films, or audio dramas. By defining sections or “chapters” and linking them to timestamps, the transcript can be used to jump between those parts in the original media, allowing recipients to switch between reading or viewing/listening mode. The resulting multi-media hybrid can be further augmented by defining formatting options for different types of content and by including various metadata, images, annotations, keywords, and the like. The result is an elegant and easy to use frontend interface that allows for full-text search and easy navigation within A/V content.
A simplified workflow of an oral history project that uses HAL both as support for researchers and to publish research results and multimedia documents can be seen in figure 1. Importantly, HAL-augmented files can not only be used for the final step of publishing and presenting results. Depending on the configuration of jump marks, they can also support members of the research team in navigating recordings or in tracing observations, selected utterances and topics over multiple sources. Meanwhile, the method is flexible enough to be introduced on the fly at any point in time. If desired, its use may also be restricted to a certain part of collected sources, e.g. only those that are selected for public access.

Figure 1: Sample workflow with HAL integrated in an oral history project
Using an existing interface to a database of videos on history and politics of digital culture, participants will experiment with different ways to use HAL links within transcripts. We will discuss and compare our Drupal-based implementation with other existing tools such as ELAN, an annotation tool for A/V content developed at the Max Planck Institute for Psycholinguistics in Nijmegen. We will also talk about some of the key differences between software designed to work as a standalone tool for researchers versus a method developed for usage either as the interface of an online archive or as an enrichment or additional feature of web publishing formats.
To round off the workshop, we will start to build a web page featuring A/V content, a transcript, and additional material such as photographs and annotations from scratch. Using just a text editor and a few simple lines of Html and JavaScript code, participants will learn to apply HAL augmentation to their own publications. We will use a pre-configured installation of the open source content management system Drupal that can be customized to fit different use cases, including curated online collections, project documentation, event pages etc.
For this second practice part of the workshop, participants should ideally bring their own A/V sources and start working on their personal project. Those who don’t have a specific use case in mind (yet) will be able to choose among different sources provided by the workshop team instead.
The last workshop unit will be a discussion dealing with the editorial planning for publication. Whether you are an individual producer, a research team or an online cluster, you will have to make certain decisions in how to manage the publication: Who is the audience that will be primarily addressed? Which specific needs have to be taken into account with regards to usability, accessibility and technological literacy? How much (if any) pre-existing knowledge on the topic can be assumed, and how much introductory information should be included?

Target audience and requirements
Previous workshops and presentations of Hyper-Audio-Linking have been held at BASICS, transmediale Berlin, at the Ludwig-Boltzmann-Institute for Media.Art.Research, Linz, and most recently as part of the EADH conference Galway 2018. Based on these earlier workshops on similar topics, we expect to work with a group of between ten and fifteen participants, which seems ideal for a more hands-on experience.
There are no skill requirements for participation, all introductions into software usage and programming will be suitable for beginners but can be easily adapted to participants with more advanced levels of pre-existing knowledge.
Participants should bring their own laptop. Participants who use a laptop provided by their employer should make sure to either have all software pre-installed or know that they are authorized to install software on their computer.
All required software is either open access or free to use, there will be no additional costs.

Pre-conference support and provision of material
Participants who would like to familiarize themselves with HAL in advance will be given access to stubs of prepared A/V content on our platform Transforming Freedom.org four weeks before the conference start.
Download links for other software packages and instruction material on how to install and use the respective software will be made available to the participants about two weeks in advance of the workshop.

Intended length and format
The duration of the workshop will be a half day, that is, approximately four hours including breaks. As far as the format is concerned, we will alternate between theory-focused presentation, group discussion, and practical tasks that let participants try out different techniques supported by the instructors.


Barber, J. F. (2017). Radio Nouspace: Sound, Radio, Digital Humanities.
Digital Studies/Le champ numérique, 7(1), 1. DOI:
http://doi.org/10.16995/dscn.275 (accessed 05 May 2019).

Cohen, S., Rakerd, B., Rehberger, D., & Boyd, D. A. (2012). Oral history in the digital age: the imperative for rethinking best practices based on a survey of the field(s). Retrieved from
http://ohda.matrix.msu.edu/2012/07/ohda-survey/. (accessed 04 May 2019).

Ludovico, A. (2012). Post-Digital Print: The Mutation of Publishing since 1894.
Onomatopee 77.

Murray, A. and Wiercinski, J. (2014). A Design Methodology for Sound-based Web Archives.
Digital Humanities Quarterly 8.2,
http://digitalhumanities.org/dhq/vol/8/2/000173/000173.html (accessed 05 May 2019).

Sommer, B. (2016).
Practicing Oral History in Historical Organizations. London: Routledge.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.