Collections in the World Historical Gazetteer

paper, specified "long paper"
Authorship
  1. 1. Karl Grossner

    University of Pittsburgh

  2. 2. Ruth Mostern

    University of Pittsburgh

  3. 3. Susan Grunewald

    University of Pittsburgh

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


The
World Historical Gazetteer (WHG) project (Grossner, Mostern 2021), launched in mid-2021, aggregates information about historical places contributed by researchers working in numerous fields within the humanities and social sciences. We aim for the WHG web software platform to become an essential and permanent component of global Digital Humanities infrastructure. At this writing the system holds nearly 2 million records for about 1.6 million distinct places. Of these, roughly 60,000 are temporally scoped. The remainder are drawn from modern sources, and form a core that will increasingly develop temporal depth as new contributions are received. Many additional contributions are now in queue at various stages of accessioning. The WHG provides reconciliation services useful in the creation and enhancement of specialist gazetteers, and serves as a publication venue for smaller projects wishing to provide dynamic access to their place data. This paper summarizes the project’s approach to linking gazetteer datasets as collections, briefly describes the underlying data model and workflow, and outlines those attributes making it a distinctive resource for humanities scholarship.

Gazetteers from Historical Sources
One of the primary use cases driving WHG development has been supporting the increasingly common scholarly activities of mapping and spatially analyzing phenomena recorded in historical source material. Texts of all kinds, and tabular data resources like birth records and ship logs, typically reference named places, but resolving names to locations is often difficult and time-consuming: toponyms normally have numerous variants that change over time; smaller settlements are frequently missing from modern gazetteer resources; places may have been abandoned, or even relocated. The WHG platform provides reconciliation services for Wikidata and the Getty Thesaurus of Geographic Names (TGN) to facilitate this geolocation task. It also serves as a place where the results of that activity can be stored, managed, published as specialist gazetteers, linked to datasets with overlapping spatial coverage, and freely accessed by others so that the work need never be repeated. As WHG content grows, researchers will be more readily able to resolve the place references in their sources.
Teaching is another important use case for aggregated historical place information. The WHG team is collaborating with schoolteachers who are developing classroom activities using rich, linked, and temporally scoped records from the WHG that concern places with complex and multilingual histories.

Collections of gazetteers in WHG
The WHG platform provides a publication venue for individual project gazetteers of any size, as well as a
Collections feature that can be used to link multiple gazetteer datasets relevant to a region, period, or theme. Over the past fifteen years the
Pleiades project, “a community-built gazetteer and graph of ancient places,” has demonstrated the value of aggregating place data for a particular region and period from the work of multiple research projects in many disciplines. Other communities of interest are forming around various themes and region/period combinations, however most do not have the resources to stand up an equivalent of the Pleaides platform. The
Collections feature in WHG enables the replication of that successful open data publication model for them. Over time collections will grow to become focused domains within the WHG platform that can be searched independently and accessed or downloaded for use in annotation tools like
Recogito.

Figure 3 - The Dutch History collection in WHG, showing an emerging footprint of Dutch colonial history
At this writing, two collections are in development and visible in the WHG and a third is in the planning stages. The
Dutch History collection has so far linked six disparate datasets that together have begun to trace the spatial footprint of Dutch colonial history. The
HGIS de las Indias collection includes over 15,000 settlements (
lugares) and 900+ administrative
territorios for the eighteenth-century Spanish Americas, and will seed a broader
LatAm collection in the future. The
Historical Middle East Data Alliance (Hist-ME) has begun planning the formation of a large WHG collection to include data from many of its 35 members. Similar initiatives for Central Eurasia and Central and Eastern Europe are at earlier stages of development.

Contributing to WHG
The WHG requires contributed data to arrive in
Linked Places format (LPF), developed in collaboration with the
Pelagios project for use in our respective systems. The place attestation model underlying LPF is expressive—records as a whole can all be temporally scoped and sourced, as can any number of a place’s name variants, locations, types, and relations—but most elements are optional. A simpler delimited text format, called
LP-TSV, is an alternative for relatively simple records.

Figure 1 - Conceptual model of a place attestation in Linked Places format

Pipeline and Workflow
The WHG performs place data aggregation at a few levels. The entire contribution pipeline is as follows: 1) registered users create new WHG datasets by uploading LPF or LP-TSV files into their private workspace, and can assign ‘co-owner’ or ‘member’ roles to other registered users; 2) dataset owners use the WHG reconciliation services for Wikidata and
Getty TGN to augment a dataset’s records with additional geometry and concordance identifiers; 3) once a dataset has been enriched in this way it can be flagged as public, making it visible, searchable, and, in one sense, aggregated with other public datasets; 4) the further step of accessioning allows contributors to reconcile a dataset’s records against the WHG’s union index. A record closely matching an existing index record is linked to it, effectively growing a named graph for each distinct place. An incoming record with no prospective matches in the WHG index automatically becomes a new “seed” for that place. In all cases of reconciliation, dataset owners decide which prospective matches are valid, using a
Review screen that presents a significant level of context to aid the decision.

Figure 2 - Contribution pipeline in the WHG

The WHG as Digital Humanities infrastructure
There are a number of existing place name resources

For example, GeoNames, Getty TGN, VIAF, Bibliothèque Nationale de France (BnF), VIAF
, and some are explicitly historical

For example, Pleiades, Vision of Britain, Trismegistos, Syriaca.org
, but World Historical Gazetteer is distinctive in having the following attributes
in combination:

it has global geographic coverage
it records structured, granular temporal information
its records are attestations of place references sourced by researchers
it serves as a Linked Data publishing platform for datasets without a web presence
its union index links multiple attestations for places, effectively linking projects and disciplines
it provides faceted record-level search across datasets
it provides machine access via APIs

The usefulness of the WHG in helping humanities scholars understand the “where” of the historical phenomena they study will only grow as the WHG index grows over time, and the project team has begun organizing a consortium of supporting organizations to ensure that it does.

Bibliography

Grossner, K
.
, Mostern, R. (2021). Linked Places in World Historical Gazetteer. In
5th ACM SIGSPATIAL International Workshop on Geospatial Humanities (GeoHumanities’21), Beijing, China.
ACM, New York, USA. Retrieved from: http://kgeographer.org/pubs/Grossner-Mostern_LinkedPlacesInWHG_final.pdf

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO