Content Patterns in Digital Humanities: a Framework for Sustainability and Reuse of Digital Resources
Anderson, Sheila , King’s College London, firstname.lastname@example.org
Hedges, Mark , King’s College London, email@example.com
Research in the arts and humanities has created much digital material that represents a significant investment, both of funding and of intellectual effort. In the UK at least, given the current lack of national infrastructure for sustaining this material, these resources are typically hosted in their home institutions using a variety of approaches and technologies.
This incurs a number of risks. At the most basic level, without ongoing maintenance a resource ceases to be usable at all as the technologies in which it was implemented become obsolete and unsupported. Even if hosting institutions apply preservation techniques to ensure continued accessibility of resources, this does not enable collections to make full use of technological advances that might greatly enhance their utility for and impact on research. Access to legacy resources may be limited to a simple download or browser access in a website. In neither case does this facilitate advanced research services, such as mashups or data/text mining, that will become increasingly common in future digital research.
The impact of humanities research may still be felt many years after the original research was undertaken – the information produced has a long lifespan in intellectual terms. Sustainability does not just mean keeping the data alive, but enabling the exploitation of advances both in technology – making the data accessible in new ways – and in humanities research – forging connections between resources that lead to new discoveries and broader impact.
Digital resources often exist in “silos”, lacking interoperability. Individual projects typically address focused topics, and may implement digital resources in idiosyncratic ways and to address their immediate needs. This results in a multitude of resources that are scattered and disparate in nature, yet related intellectually, resources that, linked up, would form a whole much more useful for research than the sum of the parts, much as fragments of a map, when combined, allow navigation from one place to another. Ultimately, the vision here is of a virtual and distributed “web of knowledge”.
The digital resources in the humanities may be characterised by their diversity and complexity. Collections involve multiple media and standards. The material may be highly complex, with many structural and semantic relationships both internal and contextual; the interpretation of an object (e.g. an inscription) may depend on its relationships to other resources (e.g. other inscriptions/texts, surveys, concordances).
One approach to this would be to develop enhancements to individual resources; however, to be truly sustainable we should avoid such ad hoc solutions. The primary question asked by our project is thus – how can we develop a generic framework for digital resources in the arts and humanities that addresses the above issues for a broad range of collections, and that is not a closed system but can be extended to support other digital material and (possibly unanticipated) future tools, technologies and research methods?
We are attempting to answer this question in the CMES (Content Models for Enhancement and Sustainability) project, which is funded by the UK Arts and Humanities Research Council as part of its DEDEFI (Digital Equipment and Database Enhancement for Impact) programme. We are developing a framework using the Fedora digital repository software for sustaining and enhancing particular groups of digital resources produced by earlier digital humanities. We are addressing two groups of collections, each typical of a wide range of humanities research activities:
Digital texts, which may comprise complex networks of diverse information: images, markedup text, geospatial data, translations, standoff annotations/markup, and potentially extensive links to external resources. We are addressing two groups of resources managed at KCL: the Stormont Papers, and the Inscriptions of Aphrodisias. These contrasting examples – one modern and dealing with large, complete volumes, one ancient and dealing with small, fragmentary texts –facilitate development and testing of generic models. They also provide scope for demonstrating the utility of our framework for (i) developing new material (e.g. Stormont parliamentary papers and Inscriptions of Roman Tripolitania/Cyrenaica), and (ii) forging links with external digital material (e.g. Westminster Hansards and Pleiades).
Multimedia performing arts collections, specifically the following resources managed at KCL: Scottish Traditions of Dance, which contains text, images, video, interviews, audio and databases, Adolphe Appia, which contains images, 3D virtual reality models, and audio from the King’s Sound Archive.
Fedora is particularly good for modelling complex material and links between objects. Representations of digital objects within Fedora are formalised as “content models” (henceforth, CMs), which may be regarded as “data types” for digital objects. We will review the selected collection groups and develop a set of CMs that support them by providing consistent, standardised and interoperable (yet flexible) patterns for representing these collection types. We will need to go beyond Fedora’s relatively simple CM formalisation to produce these “Content Patterns” (henceforth, CPs) for complex collections, e.g. by using the Enhanced CM framework developed by SULD, which allows the specification of relationships and ontologies, and the definition of collection templates.
We analysed the resources along with subject specialists in digital text and performing arts resources. Note that, given the variation in how legacy collections have been implemented, the CPs may be idealisations that do not directly match the collections, which may require a degree of reworking to make them fit. We will not be overprescriptive here – diversity arises naturally from the research material – but a degree of common practice would be beneficial for the creation and reuse of the material. Moreover, our CPs will provide foundations that can be extended easily to support diverse community practices.
Each of the target collections had its own custom web interface, driven by quite different underlying data models. We are developing consistent delivery/publishing mechanisms for the different collection groups that are driven by the underlying CPs. This has the benefit that these mechanisms are available for any collection that conforms to the CP, leading to more consistent and interoperable interfaces for resources of similar type.
However, this will not necessarily lead to homogeneity. Our approach enables the structure of collections to be represented with fine granularity, and interfaces are correspondingly modular. This facilitates the creation of more integrated web views across different collections, but it also allows content to be exposed as machinereadable feeds that can be used to provide addedvalue services, e.g. aggregating content, automated processing (e.g. text mining), mashups etc. The creators (or curators) of a resource will no longer be the arbiters of how information should be delivered and used. The resources produced by research are not just ends in themselves – they provide source material for subsequent research – and to maximise impact they should be made available in ways that allow scholars unrelated to the original editors to make transformative use of them, rather than just via a website. We are thus providing a framework whereby users (perhaps domain experts) can develop and integrate their own tools to process resources.
The project is thus not only enhancing particular collections, but producing a framework that is extensible in several ways:
The generic CPs and associated tools provide templates for simplifying creation of new collections of similar form (e.g. digital texts), and guarantee certain functionality that conformant collections would inherit from the template.
The set of content patterns is itself extensible, following the same methodology, to other collection types.
The framework can be extended with new tools as technologies change (tools/services can be linked to CPs and inherited by collections that follow the pattern).
The project thus builds on existing efforts and provides a foundation for a broader and longerterm programme for sustaining and enhancing digital humanities research. Developing this framework to support resources based around digital texts and performing arts will cover a significant amount of ground, and provide a springboard for future extensions. It will also ensure sustainability by integrating these initiatives into repository and curation infrastructures at KCL, and will allow a growing corpus of digital material to be integrated into this infrastructure.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Stanford University
Stanford, California, United States
June 19, 2011 - June 22, 2011
151 works by 361 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)
Conference website: https://dh2011.stanford.edu/
Series: ADHO (6)