Digital Resource Aggregation: Giving New Life to Multi-source Cultural Data

Rui Liu; Dana McKay; George Buchanan

Authorship

1. Rui Liu

The University of Melbourne, Australia
2. Dana McKay

Royal Melbourne Institute of Technology, Australia
3. George Buchanan

The University of Melbourne, Australia

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Digital humanities have traditionally been concerned with utilizing digital technologies in processing cultural resources such as text, images, and specialist data (e.g. geographic information)
Tiepmar2018169(Tiepmar, 2018)16916917Jochen TiepmarBig Data and Digital HumanitiesArchives of Data Science, Series AArchives of Data Science, Series A512018https://dx.doi.org/10.5445/ksp/1000087327file:///Users/liurui/Downloads/10.5445_ksp_1000087327.pdf10.5445/ksp/1000087327?>(Tiepmar, 2018). In practice, many digital humanities projects aggregate cultural data from multiple sources. There are two types of aggregation, where the first one collects data from original (unpublished) datasets to form a new digital humanities collection, while the second integrates parts from existing collections by sharing data or forming a new system from them

(Siqueira and Martins, 2021, Freire et al., 2018). This research explores both forms of digital resource aggregation in digital humanities and illustrates how aggregation breathes new life into multisource cultural data. Taking the perspective of the construction of digital humanities collections, our case studies discuss different stakeholders’ views of digital resource aggregation and their work practices. The following three research questions will be addressed in this paper:

What are the digital resource aggregation approaches of digital humanities collections?
What are the problems encountered when aggregating digital resources?
What are the key lessons learned when attempting digital resource aggregation?

Our research applies semi-structured individual interviews to understand the practical experience of digital humanities projects that brought content together from multiple existing collections. The data were collected from September to November 2021. We recruited two pilot study and nineteen main-study participants. All participants are digital humanities scholars active in major research centres and conferences with digital humanities projects. During interviews, we asked participants about their background, the details about their data aggregation project, their role in the project, the aggregation method they used, aggregation problems, their advice and perspective of multisource cultural data aggregation through digital humanities collections. Each interview took approximately 60 minutes and all interviews were recorded and transcribed. We coded our data based on the three research questions (shown above) using NVivo. Eleven main participants have direct experience of data aggregation. Followed by that, we formed these experiences into case studies of individual projects.
Our paper focuses on the eleven cases of integrating multisource cultural data. Each case study considers a single project’s aims, team, construction steps, digital resource aggregation approaches, and any suggestions from interviewees for how to overcome challenges. The aggregation datasets in these eleven case studies include medieval manuscripts and illuminated manuscripts, religious and political documents, digital archival resources for medieval to modern history, biography of historical people, places information, archaeological data sets, music data sets and annotation works. Our participants included two developers, one Ph.D. candidate with good programming skills, one librarian who acquires programming by self-learning, one student research assistant, and six project managers with socio-technical background.
These eleven cases represent a wide variety of different approaches. Some use more than one: five cases use Linked Data to integrate multisource data, three of them use the International Image Interoperability Framework (

IIIF) to integrate images, two use a CMS system to manage data, one uses a SQL database to aggregate different data, one uses an API to link different digital humanities collections, one uses XML to do digital resource aggregation and one makes metadata protocols to do aggregation.

The results of our case-studies reveal different digital resource aggregation challenges and solutions within digital humanities, and suggest there may be untapped possibilities for aggregation in digital humanities research. Key problems in digital resource aggregation can be categorized into four aspects, namely data problems, system problems, institution problems and skillset problems.
The data problems are related in (1) some unstructured data, such as ambiguous metadata schemata and uncertain standards, lack of digital readiness, and issues of different data formats; (2) integrating data with the same type but that have different data levels of detail, e.g.time, and (3) original data sets that cannot be shown in the aggregator website would make aggregation of digital humanities collections harder. For system problems,the aggregation of different data sets would be influenced by system updates, software changes, systems interaction issues and visualization problems. For institution problems, funding and licensing are important. For skillset problems, the distribution of skills among digital humanities researchers and technical threshold of aggregation technology raises the required time and labor for higher aggregation performance.
The digital humanists we interviewed provided us valuable advice when attempting to achieve digital resource aggregation, such as unifying metadata standards, improving data sharing policy and data explanation, encouraging collaboration from community and administrators, and enhancing digital humanities research infrastructure and project sustainability.

Bibliography
FREIRE, N., MEIJERS, E., VOORBURG, R. & ISAAC, A. 2018. Aggregation of cultural heritage datasets through the Web of Data.
Procedia Computer Science, 137
, 120-126.

SIQUEIRA, J. & MARTINS, D. L. 2021. Workflow models for aggregating cultural heritage data on the web: A systematic literature review.
Journal of the Association for Information Science and Technology.

TIEPMAR, J. 2018. Big Data and Digital Humanities.
Archives of Data Science, Series A, 5.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022

"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO

Digital Resource Aggregation: Giving New Life to Multi-source Cultural Data

1. Rui Liu

2. Dana McKay

3. George Buchanan

ADHO - 2022

"Responding to Asian Diversity"