World-Historical Gazetteer

paper, specified "short paper"
  1. 1. Karl Grossner

    University of Pittsburgh

  2. 2. Ruth M. Mostern

    University of Pittsburgh

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper reports on World-Historical Gazetteer (WHGazetteer), a three-year project funded by the US National Endowment for the Humanities, now two-thirds complete.

Goals and Purpose
WHGazetteer is a scholarly infrastructure project intended to support historical research across many disciplines. It is principally a web-based software system for aggregating open access data about places and linking it with data about historical entities associated with those places. We have seeded the system with a global ‘spine’ dataset of some 30,000 places, developed expressly for this purpose.
There are existing repositories of historical place data, and many more are in development. A few are explicitly termed gazetteers

Vision of Britain

, some are historical GIS web resources

China Historical GIS
HGIS de las Indias

, and others comprise place data tables developed and published in the course of other research projects. Typically, each concerns a particular geographic area and a particular temporal scope

Map of Early Modern London


Bu contrast, WHGazetteer is soliciting and aggregating attestations of historical places for all regions and periods, along with annotations of records about historical entities with identifiers for those places. The project is conceived from a world-historical perspective. By this we mean several things. First, that it is intended to facilitate representations of connection and movement; second, that it scales up to global processes and the longue durée; and finally, the places for us includes ethnonyms for cultural regions, physical features, and ecoregions, helping ensure coverage for all parts of the globe and providing geographic context.
In many respects WHGazetteer follows and extends the pilot implementation of
Peripleo, the linked data gazetteer system developed by the
Pelagios Commons project. We are working closely with the Pelagios group to leverage their accomplishments, maintaining and extending the trajectory they established.

Places and Traces
WHGazetteer will solicit contributions of two kinds of place-related data. One is attestations of place names found in historical texts and maps:
Places. The other is annotations of records about any sort of historical entity (artifacts, creative works, persons, events, etc.) with place identifiers from published gazetteers:

WHGazetteer catalogues place references associated with a time period, which may be a date of publication for an historical source or temporal information about a name as in a modern historical atlas. In either case, a historical gazetteer records attestations from sources that assert that a name existed at a certain time. This approach to temporal scoping differentiates us from modern gazetteer data sources like GeoNames and Wikidata. A single place may (and ultimately should) have many linked attestations for different periods, where names, spellings, place type, relations, and geometry vary.

Traces are assertions of spatial and temporal scope for historical items of almost any kind: the footprint of an individual’s lifepath; the findspot of a coin hoard; the itinerary of a journey; the extent of a war; the places referred to in any sort of text, from a treaty to a novel; and so on.

Standard Data Formats
We are developing two standard contribution formats conforming to linked data requirements. The first,
Linked Places format, is essentially complete and is being tested with early contributions.
Linked Traces format, based on the W3C Web Annotation Data Model, will be completed in late spring, 2019.

Contribution Pipeline
Projects contributing to WHGazetteer come in all sizes and have varying levels of technical capability. Some have elaborate web interfaces and maintain permanent URIs for thousands of place records. These projects will have little difficulty exporting abbreviated records transformed to Linked Places format. Others have dozens or at most hundreds of place references drawn from sources specific to their domain of interest and are unable to stand up per-place pages with permanent URIs. The WHGazetteer contribution pipeline (Fig. 1) allows both groups to a) upload compatible Linked Places or CSV files, b) reconcile their records against Getty TGN, DBpedia, GeoNames and Wikidata as well as the WHGazetteer index itself, c) review and validate results of that matching process, thereby augmenting their records with closeMatch or exactMatch links, and d) contribute reviewed records to the index, published under permanent URIs provided by WHGazetteer.

Backend Stores and Middleware
The WHGazetteer system backend is comprised of a PostgreSQL relational database and multiple Elasticsearch index stores. These and the APIs for internal use and public access are managed with Django, a Python-based web development framework.

WHGazetteer will provide a public API supporting machine queries to all indexed data. A graphical web interface will support contribution activities for authenticated users, and provide search and mapping capabilities for both Places and Traces.
Linking Place and Trace data in a single backend allows us to present linked data ‘portals’ for all indexed places, which will grow richer over time as our indexes expand. Using either the public API or graphical interface one might discover, for a given place: its names, shapes, and relations with other places over time; people whose lifepaths have intersected it; journeys for which it has been a waypoint; and texts and artworks for which it is a subject.
Our contribution pipeline interface will enable several kinds of pedagogical applications. For example, students and instructors will be able to create custom Traces associated with course material. Advanced students will be able to upload CSVs of gazetteers they have created, and use the WHGazetteer reconciliation tool to augment their records with information in the WHGazetteer index.

Version 1 of WHGazetteer is scheduled for launch in January 2020. A beta version will be available by July 2019.
We have established data partnerships with roughly a dozen contributors of large datasets covering a range of regions and periods, and also have a queue of many smaller projects. Data pipeline functionality to receive these is nearly complete (Fig. 1).

WHGazetteer has been designed to require minimal hand-holding for contributions, and tools for efficient curation and maintenance. University of Pittsburgh’s World History Center is committed to maintaining the system for the foreseeable future.

Figure 1 – Data pipeline (partial): (a) upload dataset(s); (b) perform reconciliation against modern authorities; (c) review/validate reconciliation ‘hits’

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.