Sound and (moving) images in focus – How to integrate audiovisual material in Digital Humanities research

workshop / tutorial
  1. 1. Roeland Ordelman

    Netherlands Institute for Sound and Vision - University of Twente

  2. 2. Max Kemman

    Erasmus University Rotterdam

  3. 3. Martijn Kleppe

    Erasmus University Rotterdam

  4. 4. Franciska de Jong

    Erasmus University Rotterdam

  5. 5. Stef Scagliola

    Erasmus University Rotterdam

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The proposed workshop intends to address the poor representation of audiovisual data in the evolving field of Digital Humanities. Sources such as television, film, photos and oral history recordings have not yet received the same level of attention from scholars as written sources. This can be considered as problematic in the light of the growth in volume of audiovisual sources in the near future, and the abundance of information that could be (re)used by various disciplines. In four sessions the workshop will discuss (a) issues related to the integration of audiovisual data in DH, (b) the necessary conditions and possible solutions, (c) examples of best practices and (d) an agenda for the future.

Audiovisual material is perhaps the biggest wave of data to come in the near future (Smith, 2013). This claim is supported by a prospective study conducted by IBM on how the flow of digital data will evolve in the coming two decades. As can be seen in Figure 1 below, the development of audiovisual sources such as video, images and audio, will result in huge amounts of data in the coming decades, both due to the increased production of digital-born data and the massive digitisation of analogue sources. Consequently, audiovisual archives hold the promise of truly big data becoming available to academic researchers.

Fig. 1: Expected wave of data showing the growth of audiovisual data (video, images, audio) this workshop will deal with. Source: IBM Market Insights 2013

Audiovisual sources have a potentially huge value for the Digital Humanities as they are multi-layered. A single document can provide information regarding language, emotions, speech acts, narrative plots and references to people, places and events. This richness provides interesting data for various disciplines and holds the promise of multidisciplinary collaboration between e.g., computer sciences, social sciences and the humanities. As such, audiovisual material provides a rich playground for the Digital Humanities.

Notwithstanding this exponential growth, the use of audiovisual data by scholars in the social sciences and the humanities (SSH), and the application of digital methods for analysis are still in their infancy. Audiovisual material such as television, photos and oral history recordings have not yet received the same level of attention from scholars as written sources. Several reasons might account for this deficit. Firstly, the relatively young age of these source types compared to text; this is reflected in scepticism on their value for academic research outside a relatively small community of specialists. Secondly, the contemporary and commercial value of many audiovisual sources results in considerable constraints for use due to issues of copyright. Thirdly, the linear structure of audiovisual sources is problematic for hermeneutic analysis as it is more time-consuming compared to textual sources. Finally, no widespread accepted digital research methods for the discovery and analysis of audiovisual content exist as of yet. Unlike fellow scholars who study text and have a multitude of refined tools at their disposal, scholars specialised in documentaries, photo, film and audiovisual oral history collections, face considerable limitations in the various stages of the research process (De Jong et al., 2011). In the context of the proposed workshop, two themes will play a crucial role.

Theme 1: Indexing and searching audiovisual data
The first step in an SSH research process is the identification of relevant and interesting material. However, obtaining good search results is highly dependent on the richness and the level of granularity of the metadata assigned to the sources. Metadata is usually attributed to a document by a knowledgeable archivist. However, considering the sheer size of digital audiovisual content that is being produced daily, manual annotation is no longer feasible. Consequently, one of the first big challenges within the realm of audiovisual archives is the development of systems for accurate automatic annotation.

One could say that a revolution is needed similar to the one that full-text search (or automated text indexing) brought about. Content-based image retrieval has only recently made enough progress to be usable for scholars. Techniques such as speech recognition and computer vision will support exploration of digital audiovisual archives on the basis of multiple modalities such as text, sound and image. However, this introduces the problem of the so called semantic gap, which refers to the difficulty of translating low-level pixel data and sound waves into meaningful annotations (Smeulders et al., 2000). How this semantic gap affects discovery of material in audiovisual archives is still under exploration.

Theme 2: Analysing audiovisual material
Besides identifying relevant content, an even bigger challenge on the side of the humanities and social sciences lies in providing tools for the next phase of the research process: the analysis and interpretation of content. While text mining has led to the phenomenon of distant reading of textual material (Moretti, 2013), which is strongly dependent on good visualisation tools, the advances in speech and image recognition have not yet led to a method of ‘distant viewing’ of audiovisual data. Processing large amounts of data and enabling researchers to trace patterns or discrepancies in their material are thus not yet feasible. Moreover, the lack of metadata which is often a feature of audiovisual archives introduces additional difficulties in heuristic practices (Fickers, 2012). Consequently, scholars working with (moving) images and sound are at a disadvantage in the evolving field of the Digital Humanities and effort has to be put in envisioning solutions.

The proposed workshop aims to bring scholars and computer scientists together to discuss the following questions in four sessions.

Why are audiovisual archives scarcely used within the (Digital) Humanities? (Session 1)
What are possible technical solutions to stimulate the use of audiovisual archives within the (Digital) Humanities? (Session 2)
Which successful applications of DH on audiovisual data can serve as best practice? (Session 3)
Can we formulate a research and development agenda for a future uptake of audiovisual data in the (Digital) Humanities? (Session 4)
The keynotes within the first two sessions will be delivered by Prof. Andreas Fickers, who will talk about the use of audiovisual sources within humanities research, and Dr. Arjan van Hessen, who will discuss the necessary technical and infrastructural provisions for the analysis of these sources. For the third session we will invite scholars to submit papers and demos. The fourth workshop will focus on the evaluation of the findings and the formulation of an agenda for the future. To disseminate the results of the workshop among a broader audience, the initiators intend to propose a special issue on this topic to a Digital Humanities journal.

The proposed workshop is initiated by researchers working within the EU FP7 research project AXES – Access to audiovisual archives ( We thank the AXES project for the financial support to organise the workshop.

Jong, F. de, Ordelman, R., & Scagliola, S. (2011). Audio-visual Collections and the User Needs of Scholars in the Humanities: a Case for Co-Development. In Proceedings of the 2nd Conference on Supporting Digital Humanities (SDH 2011). Copenhagen, Denmark.

Fickers, A. (2012). Towards A New Digital Historicism? Doing History In The Age Of Abundance. VIEW Journal of European Television History and Culture, 1(1), 19–26.

Moretti, F. (2013). Distant Reading. Verso Books.

Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12), 1349-138

Smith, J. R. (2013). Riding the multimedia big data wave. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval - SIGIR ’13. New York, New York, USA: ACM Press. doi:10.1145/2484028.2494492

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None