All databases are created equal: building profiles for database standards and interoperability in the Humanities

poster / demo / art installation
Authorship
  1. 1. Ian R. Johnson

    University Of Sydney

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In this paper I will discuss the development of standard
database profiles which shortcut the process of building
complex interlinked Humanities databases. Although there is
a risk of creating restrictive uniformity which stifles creativity,
I argue that the benefits of well-structured starting points
far outweigh the drawbacks, both in terms of immediate
productivity and the avoidance of less than optimal structures.
I will use examples from two national infrastructure projects to
illustrate the use of database profiles to provide a databaseon-demand
service which integrates readily into an aggregated
search of cultural datasets and specific user needs.
Over the past few years I have worked with a number of
projects - notably historical and archaeological projects - to
Lausanne, Switzerland
483
model their database needs in Heurist (HeuristScholar.org).
From these models I have been able to generalise a set of
requirements for commonly encountered entity types, and
their interrelationships. While there is significant variation in
the specific fields required to describe entities (notably the
degree of detail required), there are a range of common
entity types such as - for historic data - person, organisation,
building, ship, voyage, epidemic, performance, venue,
work and other bibliographic types, along with repeatedly
used descriptors (fields), which are common to a range of
projects. Furthermore, there are a range of relationships
between entities, including familial relationships, roles, event
relationships and bibliographic relationships, which are widely
shared.
The role of a database profile is first to shortcut the low
level and time-consuming task of defining a set of commonly
used descriptors. For example, the description of a person -
one of the most widely used and easily standardised entities
- will commonly require some or all of family name, given
names, sex, date of birth, various forms of address and so
forth, with a bifurcation between contemporary individuals (eg.
participants, with an email address and phone numbers) and
historical individuals (eg. slaves with national or racial origins
and date of death). Most of this is not demanding, but it is still
common to see poorly structured descriptors, (such as text
for categorisation fields or loosely structured coordinate data
in place of geometries), with little or no metadata, beyond
a field or column name, identifying the nature of the data
recorded. A pre-populated entity description ensures good,
clear, documented descriptors, which will rarely be perturbed
by, or stand in the way of, individual customisation.
More critically, there are several alternative ways of building
relationships between records, ranging from simple pointers
or foreign keys to typed relationships with annotation and
temporal range. Relationships may be constrained with specific
cardinality eg. a person can only have two biological parents,
but uncertainty or cultural perspectives can complicate such
simple rules. The particular solutions adopted in handling
relationships can have important ramifications down the line
when it comes to searching, analysis and presentation, but
these ramifications lie outside the experience of a researcher
lacking a data modelling background. Even with such a
background, it can be hard to identify the optimum solution
without some trial and error.
Heurist has been incorporated into two national infrastructure
projects - HuNI ( Humanities Networked Infrastructure http://
huni.net.au) and FAIMS ( Federated Archaeological Information
Management System http://fedarch.org). In both cases, Heurist
provides a database-on-demand service, allowing researchers
to build their own databases on the NeCTAR Research Cloud,
or on their own servers (University or commercial IP), without
recourse to technical assistance. For each of these projects
we have therefore developed an initial database profile which
reflects the needs of the community, drawing on both the
ontologies developed by these infrastructure projects and
on our experience in developing practical data models for
numerous individual projects.
The HuNI project has put significant effort into establishing
an ontology (http://wiki.huni.net.au/display/DS/Input+Data
+Sources+Model) to provide interoperable search across the
23 cultural datasets aggregated in the system. The Heurist
HuNI profile reflects this ontology and incorporates a mapping
between human-friendly descriptive terms (which researchers
can modify to suit their needs), and the standardised terms
required by HuNI. This allows any database based on this
template to immediately generate XML which can be harvested
and searched within the HuNI framework.
The FAIMS project has also identified shared units of
description, although it has not generated a formal ontology.
The FAIMS profile reflects this understanding through a
common set of archaeological entities such as project, transect,
survey unit, site, trench, phase, layer, context, feature, find,
sample etc., as well as relationships such as roles, stratigraphic
and contextual relationships. However, in archaeology,
recording systems used are often influenced by national and
state legislation, as well as differing traditions of survey and
excavation methodology, so FAIMS is also building a Heurist
database containing multiple alternative recording systems
deployed as a Community Server. Structural elements - entity
types with all their associated fields, term lists and relationships
- can be imported selectively into a database created with the
FAIMS profile while retaining field and term mappings across
different systems; interoperability is thus not legislated by fixed
structures, but encouraged by reuse.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO