Machines Reading Maps: from text on maps to linked spatial data

Yao-Yi Chiang; Deborah Holmes-Wong; Jina Kim; Zekun Li; Katherine McDonough; Rainer Simon; Valeria Vitale

Authorship

1. Yao-Yi Chiang

University of Minnesota
2. Deborah Holmes-Wong

University of Southern California
3. Jina Kim

University of Minnesota
4. Zekun Li

University of Minnesota
5. Katherine McDonough

The Alan Turing Institute
6. Rainer Simon

AIT Austrian Institute of Technology GmbH
7. Valeria Vitale

The Alan Turing Institute

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Maps constitute a significant body of global cultural heritage, and the number of digitized maps is only growing. However, the lack of metadata makes the right maps hard to find: the content of many collections therefore remains opaque to researchers and the general public alike. In this paper, we discuss a digital workflow to create machine-readable data from text on maps, both as a means to make cartographic collections more accessible and interconnected, and as a source of unique historical, geographical and anthropological information.
Usually, critical investigation of maps continues on a small scale, through the close ‘reading’ of a few items. Digitisation has brought attention to place names featured on historical maps as well as other textual labels that represent a key source for analyzing ‘platial’ knowledge. But how can we create large-scale datasets and systematically explore that information?
Projects like
GB1900 or the National Library of Scotland’s

Map Transcription Projects
use volunteers to transcribe words printed on maps (Aucott & Southall 2019). Such efforts are resource intensive and hardly scalable. At the same time, the graphic style of historical maps presents a number of challenges that have hindered automatic recognition of text on maps.
Machines Reading Maps (MRM) has been working to address these issues by improving and extending existing technologies, and applying standards and best practice to make our outputs FAIR (findable, accessible, interoperable, reusable).

Setting aside the artificial opposition between manual and automatic annotation as mutually exclusive modes of working with maps, MRM’s workflow explores what can be gained from their interaction. We integrate a custom version of the annotation platform
Recogito (Vitale et al., 2021) with
mapKurator, a machine learning (ML) pipeline for automatic text detection and entity linking (Li et al., 2020).
MapKurator suggests an initial set of text-bounding polygons, and users can accept, edit or delete these suggestions in
Recogito. Further bespoke
Recogito features enable one to capture a) how labels interact with each other and with visual elements (like colors and icons), and b) what semiotic functions labels perform (e.g. locative or complementary) (
Schlichtmann, 2018). Structured data produced through this “deep annotation” are used to analyze geo-historical issues like industrialisation (Hosseini et al. 2021). Manually-annotated text data from maps may provide training data to improve and evaluate ML methods, but they also function as valuable datasets in their own right, particularly for smaller map corpora that can be annotated without recourse to ML.

Using our ML approach we predict 1) the
type of content map text describes (i.e. roads, buildings, mountains) and, for unique features, 2) links in knowledge bases such as gazetteers or Wikidata. Through this process, we unlock the potential for users to find and interpret maps by the thousands based on search by semantic types. Using the links to specific instances of places, cultural institutions can feed this data back into their catalogs to document and study the geographical coverage of their collections. One could also explore differences between existing metadata and reported locations of map labels.

In a historical research case study, we use this method to analyze labels on large-scale British Ordnance Survey (OS) maps, investigating attitudes towards historical sites during the nineteenth-century (Fleet 2011), how maps communicate national historical narratives, and the fabrication of a common idea of “The Past” (Eggert 2009). Combining manual and automatic annotation provides rich information about the distribution of historical sites as types of places. Text data (including its location, spelling, fonts, and classifications) about historical sites enrich our understanding of the ways that early OS maps represented certain periods (Anglo-Saxon, Roman or Medieval). A diachronic analysis of the text labels offers initial answers about patterns in national-scale coverage of these cartographic features and prompts further questions about the historical, social, and cultural dynamics influencing the inclusion of antiquities on OS maps and their reception among the public.

Bibliography

Aucott, P., & Southall, H.
(2019). Locating past places in Britain: creating and evaluating the GB1900 Gazetteer.
International Journal of Humanities and Arts Computing
,
13
(1-2), 69-94.

Eggert, P.
(2009). Securing the past: conservation in art, architecture and literature.

Fleet, C.
(2011). Guest Editorial: Mapping and Antiquities in Scotland. Scottish Geographical Journal, 127(2), 85-86.

Li, Z., Chiang, Y. Y., Tavakkol, S., Shbita, B., Uhl, J. H., Leyk, S., & Knoblock, C. A.
(2020, August). An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images. In
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
(pp. 3290-3298).

Hosseini, K., McDonough, K., van Strien, D., Vane, O., & Wilson, D. C.
(2021). Maps of a nation? the digitized ordnance survey for new historical research.
Journal of Victorian Culture
,
26
(2), 284-299.

Schlichtmann, H.
(2018). Background to the semiotic study of maps.
meta-carto-semiotics
,
11
(1), 1-12.

Vitale, V., Soto, P. D., Simon, R., Barker, E., Isaksen, L., & Kahn, R.
(2021). Pelagios–Connecting Histories of Place. Part I: Methods and Tools.
International Journal of Humanities and Arts Computing
,
15
(1-2), 5-32.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022

"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO