Linking Time, Space, and Statements in One GIS System: A Use Case of Studying Individuals' Biographies

paper, specified "long paper"
Authorship
  1. 1. Yi-Fan Peng

    Research Center for Humanities and Social Sciences - Academia Sinica, National Chengchi University

  2. 2. Pi-Ling Pai

    Research Center for Humanities and Social Sciences - Academia Sinica

  3. 3. Chao-Lin Liu

    National Chengchi University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Linking Time, Space, and Statements in One GIS System: A Use Case of Studying Individuals’ BiographiesYi-Fan Peng, Pi-Ling Pai and Chao-Lin Liu Digital humanities research methods have been applied to many research fields in recent years. The content contained in text corpora can be roughly summarized into five Ws, especially for biographies. In this research, we focus on spatial analysis and natural language processing (NLP) techniques to find two elements: “Where (Location)” and “When (Time)” in the biographical data. By linking and showing the extracted information via a GIS system, we built a spatiotemporal system for scholars to study historical issues from different perspectives.Research material and methodsMr. Chiang Ching-Kuo was a former president of the Republic of China. During his tenure, Taiwan enjoyed a long period of economic growth. The corpus we use is the Chronological History of Chiang Ching-Kuo (the Chronicle, henceforth) which was compiled by the Chiang Ching-Kuo Foundation. The Chronicle records Chiang’s life year by year, and the total number of characters exceeds 600,000. The numbers of characters for different periods are shown in Table 1.Table 1. Statistics for each volumeFigure 1 shows the three main steps of data processing, each of which will be explained below.Figure 1. Main steps in data processingData preprocessWe converted source texts that are stored as the MS Word files (Figure 2) into pure text files, and we extracted information about the daily activities and their time labels simultaneously.Figure 2. A page of the original materialData extractionWith the NLP techniques, we can capture the related information for future use. We employ the CKIP tools to segment words and extract the part-of-speech (POS) information. In the corpus, the POS label for the placenames is Nc, which is one of the important elements in spatial information. Figure 3 shows the statistics about the quantity of placenames of different lengths.Figure 3. Length distribution of Nc wordsIntegration processAfter extracting the placenames, we geocoded them. Since the Chronicle records events after 1910CE, the placenames are modern names. Therefore, we may use the Google Maps API for geocoding. By further combining the temporal information, the placenames can be shown on a temporal scale. Spatiotemporal SystemNext, we construct a WebGIS system that links statements in the Chronicle that mentioned the placenames in the GIS subsystem. To visualize the relevance of the corpus to spatial information, the system will show the corresponding original statements in Figure 4 when scholars click on the icons for the placenames in Figure 5.Figure 4. Original statements in the Chronicle for a chosen placename will be shown for reading. Figure 5. Placenames shown in the WebGIS system and can be clicked to queryFigure 6 shows an example, where the relevant statements were displayed on the left side of the interface. The map shows the spatial locations of the placenames. The time described by the text content is indicated in the timeline at the bottom of the system.Figure 6. An integration of time, place, and statements. Also, we provide an analysis of the frequency-time relation of placenames. It shows the year in which the placename appeared in the corpus and the frequencies of their appearances. In general, the more frequent a placename is mentioned in a particular period of the text, the more attention it should be paid to the place. Next, we use a practical example to illustrate how to analyze the text content through the system we have built.The construction of Central Cross-island Highway is an important achievement of Mr. Chiang. It is the first highway system that connects the west coast with the east coast of Taiwan. If we want to know the information about the highway in the corpus, we can find out the years of the relevant events by exploring the information of the placenames in our WebGIS and then by exploring the frequency of occurrence of the relevant placenames. There are three aspects to explore this study issue: spatial, temporal, and corpus information.Spatial information aspectFirst of all, we can move the scope of the inspection to a location near the highway through the spatial functions provided by our WebGIS. It will show the placename information related to the location. In this example, we can find information about placenames. It means that in this spatial range, these placenames are near the highway.Figure 7. Placenames near the Central Cross-island HighwayTemporal information aspectBy analyzing the frequency and time distribution of the placenames, shown in Figure 8. We may find that the peak frequency of the co-occurrence of these placenames falls between 1956 and 1962.Figure 8. The number of occurrences of placename and its years. With the spatial and temporal information derived from the previous analysis, we can find out the time period for the road construction and its effects. It can be seen from the text that the period from 1956 to 1962 was the time for the construction of the Central Cross-island Highway. The Tianxiang Scenic Area was established immediately, and the farms in Lishan were gradually developed. With the establishment of traffic routes, these locations along the highway became frequent visits by Mr. Chiang. As such, the researchers can read the statements in the Chronicle for specific times to learn about the relevant events.The aforementioned example shows that our system allows researchers to conduct a study by first identifying the relevant placesnames via our WebGIS interface. Our system links the placenames to the statements in the Chronicle. Since we have associated the statements with time labels, we could analyze the temporal distributions of the placenames. Naturally, researchers can read the statements on our WebGIS system on a temporal scale as well.ConclusionThe two elements of time and space play a very important role in the study of biographical data. Through NLP techniques and spatial information methods, we can extract important information from the corpora and build a spatiotemporal system. In this proposal, we illustrate an example of finding the time span for a specific event by analyzing the temporal frequencies of the placenames in the corpus. Researchers can discover the hidden information among the statements via our system. We believe that showing placenames on a GIS interface and relevant statements about the placenames on a time-related scale can provide helpful hints to complex research work.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO