Automatic matching method of historical event text with its corresponding thematic maps developed for the application of the ShiJi Spatio-Temporal Information Platform

poster / demo / art installation
Authorship
  1. 1. Jung-Yi Tsai

    Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan

  2. 2. Pi-Ling Pai

    Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan

  3. 3. Hsiung-Ming Liao

    Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan

  4. 4. You-Jun Chen

    Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan; Department of Mathematics, University of California, Los Angeles, CA, USA

  5. 5. Richard Tzong-Han Tsai

    Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan; Department of Computer Science and Information Engineering, National Central University, Taiwan

  6. 6. I-Chun Fan

    Institute of History and Philology, Academia Sinica, Taiwan

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


The powerful data integration function of the spatio-temporal information technology has gradually been affirmed for its importance in the historical research and application of digital humanities under the establishment of a large number of digital historical text archives. Since spatio-temporal information is an important attribute of historical events, and historical maps are important carriers for presenting historical events, after extracting thematic events from historical texts, various thematic event maps can be generated based on the place name database and basic digital historical maps of Chinese Civilization in Time and Space (CCTS) (Academia Sinica, 2002). In addition, we can construct a thematic spatio-temporal platform to integrate and present event text with its corresponding maps.
The ShiJi Spatio-Temporal Information Platform is an integrated system of thematic historical maps and texts to present the historical events recorded in Records of the Grand Historian (Chinese name ShiJi) (Jung-Yi Tsai et al., 2021). The original data on the platform is mainly based on the dataset compiled by Historian Professor Panqing Xu, with a total of 1,260 historical events and 360 thematic maps (Xu, P. -Q., 2010). There is a one-to-many relationship between the text and the maps, and after the initial manual comparison, there are still about 760 historical events that cannot be clearly related to the thematic maps. Therefore, we designed a set of preprocessing procedures and algorithms for automatically sorting applicable maps to match historical events with the thematic maps.
There are two parts to preprocessing. First, we define the spatial scope of the historical event using the coordinates of place names in the event text, and evaluate the proportion of the place name coordinates of the historical event covered in the thematic maps, so as to quickly filter out irrelevant maps. The spatial overlap ratio of the event and the map is set to 0.8, that is, the thematic map where more than 80% of the place names of an event is located will be considered valid. Another preprocessing is to use the OCR tool (Rakpong Kittinaradorn, 2020) to extract the text annotations in the thematic maps for the subsequent matching algorithms.
In the automatic map matching algorithms designed in this research, the first step is to extract the place names in the text and the map based on the CCTS place name database; the second is to convert the place names in the text and the map into TF-IDF vector (Kim and Gil, 2019), and then calculate the cosine similarity to find the maps that overlap with the main location of the historical event. The third step is to convert the place names in the text and the map into one-hot vector, and then calculate the cosine similarity of the place name distribution, so as to improve the appearance of the place names appearing on the map; finally, we integrate the cosine similarity of the place names from the map and the text to sort the applicable maps.
From the dataset processed above, we select some examples with more spatial attributes to perform mean reciprocal rank ( MRR ) (Valcarce et al., 2020) experimental calculation, such as the “Appointing Pei Gong to Attack the West” in the Battle of Julu (207 BC), and the “Xiang Yu Marching to Xi” in the Hongmen Banquet (206 BC). The MRR calculated in this way is close to 80%. Through experiments, we also discovered some interesting spatio-temporal characteristics of historical events. For example, for the automatic map matching algorithms of the event “Appointing Pei Gong to Attack the West”'' in the Battle of Julu, thematic maps with the same attack direction and route were found.
The automatic matching framework between the historical event text and the corresponding thematic map developed in this research has been implemented in ShiJi. In the future, it is expected to be further applied to the automatic linking of various historical texts and historical maps, to continuously improve the structure of this research, and to explore related research topics.

Bibliography

Academia Sinica (2002).
Chinese Civilization in Time and Space (CCTS).
https://ctext.org/static/shanghai2018/liaohsiungming-geohumanities.pptx.

Jung-Yi Tsai, Pi-Ling Pai, Hsiung-Ming Liao, You-Jun Chen, Richard Tzong-Han Tsai, and I-Chun Fan (2021). Construction of ShiJi Spatiotemporal Information Platform on the Framework of Research-oriented Knowledge Bases.
JADH 2021.

Kim, S.-W. and Gil, J.-M. (2019). Research paper classification systems based on TF-IDF and LDA schemes.
Human-Centric Computing and Information Sciences,
9(1). Springer: 1–21.

Rakpong Kittinaradorn (2020).
EasyOCR.
https://github.com/JaidedAI/EasyOCR.

Valcarce, D., Bellogín, A., Parapar, J. and Castells, P. (2020). Assessing ranking metrics in top-N recommendation.
Information Retrieval Journal,
23(4). Springer: 411–48.

Xu, P. -Q. (2010).
Atlas of ShiJi. Beijing: Seismological Press.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO