Integrating Historical Maps and Documents through Geocoding - Historical Big Data for the Japanese City of Edo

poster / demo / art installation
  1. 1. Asanobu Kitamoto

    National Institute of Informatics, ROIS-DS Center for Open Data in the Humanities

  2. 2. Shoko Terao


  3. 3. Misato Horii


  4. 4. Hiroshi Horii


  5. 5. Chikahiko Suzuki

    National Institute of Informatics, ROIS-DS Center for Open Data in the Humanities

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Historical Big Data for the Urban SpaceCity of Edo has been the capital of Japan and is known to be one of the largest city in the world since 17th Century. To answer research questions on historical urban space, such as human activities and environmental effects, historical documents should be integrated by place, time, person and other entities to turn small facts into a collection of structured data for historical big data analysis. Related work includes Pelagios, which studies historical gazetteers and georeferencing of old maps to reconstruct the geographic space, and European Time Machine, which aims at integrating historical entities to reconstruct the urban space in European cities. Our approach could also be called as Edo Time Machine. Integration of Historical Sources through GeocodingToponyms are described in many variations, especially on historical documents before the standardization of the address system. Hence a location-based historical database requires the shared address system, or the standard gazetteer, for toponym-based integration. Major challenges in toponym-based integration is variation and disambiguation of toponyms, and a question in this paper is how machine-based geocoding can deal with these challenges. DatasetEdo Map Dataset: The dataset covers place names extracted from “Edo Kiriezu” (Owariya version), a pre-modern map of Edo published from 1849 through 1863 in the form of 32 sheets. It contains not only addresses but also POIs (Point of Interests) such as bridges and temples.Edo Shopping Dataset: The dataset covers shops and restaurants extracted from “Edo Kaimono Hitori Annai”, a pre-modern shopping guide published in 1824 about 2600 shops and restaurants in Edo with the shop name, category, address and logo.To create the dataset, we took advantage of IIIF (International Image Interoperability Framework), which allows interoperable image delivery in the humanities, and IIIF Curation Platform (ICP), which is an open source software suite developed by our group to create the collection of a part of images across organizations. As a result, we created the dataset of 6418 place names from 22 sheets out of 32 sheets, and the dataset of 2454 shops from the whole book. Figure 1: Edo Kiriezu, the sheet of Yotsuya area. Red markers show extracted place names (Total 335).Figure 2: Edo Kaimono Hitori Annai. A search result for restaurants (Total 62). Experimental Results Table 1 shows the result of matching between an entry in the gazetteer and a shop address (1034 unique addresses). In addition to exact match, we tested three other approaches; matching from the first character (forward match), matching from the last character (backward match), and matching a part of the address string (partial match). Table 1 shows that exact match was successful for about 21% (212/1034). Among the 212 successful cases, 49 addresses need disambiguation within a sheet and 15 needs disambiguation across sheets. Disambiguation within a sheet, however, is usually not a critical issue because, under the block-based, instead of street-based, Japanese address system, it usually means multiple neighboring blocks. Future work includes georeferencing coordinates between old maps and the present map, and analyzing relationship between the geographic distribution of businesses and human activities in the urban space.Table 1: Matching 1034 unique addresses in the shopping guide against place names in the gazetteer. Note that some categories are not mutually exclusive.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO