Visualization of Co-occurrence Relationships Using the Historical Persons and Locational Names from Historical Documents

poster / demo / art installation
Authorship
  1. 1. Sho Itsubo

    School of Science and Engineering - Ritsumeikan University

  2. 2. Fuminori Kimura

    College of Information Science and Engineering - Ritsumeikan University

  3. 3. Akira Maeda

    College of Information Science and Engineering - Ritsumeikan University

  4. 4. Takahiko Osaki

    School of Science and Engineering - Ritsumeikan University

  5. 5. Taro Tezuka

    School of Science and Engineering - Ritsumeikan University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Visualization of Co-occurrence Relationships Using the Historical Persons and Locational Names from Historical Documents
Itsubo, Sho, Graduate School of Science and Engineering, Ritsumeikan University, Japan, cm001061@ed.ritsumei.ac.jp
Osaki, Takahiko, Graduate School of Science and Engineering, Ritsumeikan University, Japan, cm003060@ed.ritsumei.ac.jp
Kimura, Fuminori, College of Information Science and Engineering, Ritsumeikan University, Japan, fkimura@is.ritsumei.ac.jp
Tezuka, Taro, College of Information Science and Engineering, Ritsumeikan University, Japan, tezuka@media.ritsumei.ac.jp
Maeda, Akira, College of Information Science and Engineering, Ritsumeikan University, Japan, amaeda@media.ritsumei.ac.jp
Introduction
In recent years, there is an increasing use of digital technology in the study of humanities. Many historical documents are now digitally archived, enabling further analysis using computers. There are archives that are accessible on the World Wide Web, including Union Catalog of the Collections of the National Art Museums, Japan (Independent Administrative Institution National Museum of Art, 2010) and Perseus Digital Library (Crane, 2011).

Until recently, the storage of historical documents and data has been the main target of digital archive research. There are, however, many works that go on to analyze the content of the historical documents using text mining techniques. In this paper, we propose a method of visualizing relationships among historical persons using personal names and place names appearing in documents.

Proposed Methods
We propose two methods to extract and visualize the relationships among persons from historical documents. The goal of two methods is to extract dynamics of relationships among historical persons. The first method tracks temporal change in a relationship and visualizes it. The second method utilizes locational information to obtain latent relationships among persons based on their spatial activities.

Personal Relationships Using the Co-occurrence Information between Persons
We use three historical documents in Japan, “Azumakagami”, “Gyokuyou” and “Hyohanki”. “Azumakagami” is an official record written in Kamakura period (A.D.1180-1266). “Gyokuyou” is a personal diary written by Kanezane Kujou (A.D.1164-1200) in the late Heian period from early Kamakura period. “Hyohanki” is a personal diary written by Nobunori Taira (A.D.1112-1187) in late Heian period. We first extracted persons’ names using “Azumakagami” and “Gyokuyou” databases (Fukuda, 2002) that contains an index of persons. This is because in historical Japanese documents, persons are often referred to using other names besides their real names. The index of persons includes such cases also. Since the three historical documents are diaries, we defined co-occurrence to be the case where two persons appearing in the same date. We calculated co-occurrence frequencies among persons for each year.

Personal Relationships Using the Co-occurrence Information between Person and Locational Information
In this method, we used “Hyohanki”. In the first step, we obtain frequencies of co-occurrences between each person’s name and place names. Since “Hyohanki” is written in ancient Japanese, we cannot attach part-of-speech tags using existing morphological analyzers that were trained using modern Japanese. We therefore used pattern matching to find place names that were included in the dictionary. We used the “Index of Kyoto’s Place Names” created by Noboru Tani based on the “Outline of Heiankyo” (Tsunoda, 1994) and “Japan’s Historical Place Names 27: Place Names of Kyoto” (Hayashiya et al., 1979) as the data sources for place names.

We use co-occurrence as the indicator of the relationship between a person and a location. If a person’s name and a place name appear in the same paragraph, we consider it as a co-occurrence between the two. Since each paragraph often covers a specific situation or a topic, we consider it to be a better unit than dates.

Each place name is considered as a dimension of a vector space. For each person, we create a vector having the number of co-occurrences with a place name as the component. For the similarity measure, we used cosine similarity.

We use the similarity measure and the result of clustering for visualization. We used JUNG, a Java open source library for drawing graph structure.

Results of Visualization and Discussion
The result of visualizing temporal change in a relationship

Figure 1-3 show the results of visualization using the method described in Subsection 2.1. Figure 1 shows the change in the relationships of Yoritomo Minamoto and Yoshitsune Minamoto to Emperor Goshirakawa for seven years (from 1184 to 1190).

Fig. 1. Transition of relationships of Yoritomo Minamoto and Yoshitsune Minamoto to Emperor Goshirakawa

Full Size Image

The horizontal axis is the co-occurrence frequency between Emperor Goshirakawa and Yoritomo Minamoto, who became Shogun (the leader of samurai warriors) in 1192. The vertical axis is the co-occurrence frequency between Emperor Goshirakawa and Yoshitsune Minamoto, who is a younger brother and the archrival of Yoritomo. The arrows indicate the transition based on “Azumakagami”, while the dot arrows indicate the transition based on “Gyokuyou”.
The result illustrates the change in the relationship among Yoritomo, Yoshitsune and Emperor Goshirakawa. The change in 1184-1187 indicated by the arrow closely resembles that of the blue allow. The change also matches well with the historical fact. Transitions in 1187-1190 are different among two documents. This is assumed to be due to the fact that “Azumakagami” is a diary written by the Shogunate side and “Gyokuyou” is a diary written by the Imperial court side. The relationship between Yoritomo Minamoto and Yoshitsune Minamoto, who were both samurai warriors, are probably not described well enough in “Gyokuyou”.

Figure 2 shows the change in the relationships of Yoritomo Minamoto and Emperor Goshirakawa to Yoshiyasu Ichijou for seven years (from 1185 to 1191).

Fig. 2. Transition of relationships of Yoritomo Minamoto and Emperor Goshirakawa to Yoshiyasu Ichijou

Full Size Image

The horizontal axis is the co-occurrence frequency between Yoshiyasu Ichijou and Yoritomo Minamoto. The vertical axis is the co-occurrence frequency between Yoshiyasu Ichijou and Emperor Goshirakawa. The arrows indicate the transition based on “Azumakagami”, while the dot arrows indicate the transition based on “Gyokuyou”.
The result illustrates the change in the relationship among Yoritomo, Emperor Goshirakawa and Yoshiyasu. Yoshiyasu is a person who was active on the emperor side and the samurai side. Therefore, the relation between Yoritomo and Emperor Goshirakawa is strong. However, the change is different in “Azumakagami” and “Gyokuyou”.“Azumakagami” is the content of the samurai side and “Gyokuyou” is the content of the emperor side. Figure 2 shows the feature of two historical documents clearly.

Figure 3 shows the change in the relationships of Kiyomori Taira and Emperor Goshirakawa to Motofusa Fujiwara for the six years (from 1166 to 1171).

Fig. 3. Transition of relationships of Kiyomori Taira and Emperor Goshirakawa to Motofusa Fujiwara

Full Size Image

The horizontal axis is the co-occurrence frequency between Motofusa Fujiwara and Kiyomori Taira. The vertical axis is the co-occurrence frequency between Motofusa Fujiwara and Emperor Goshirakawa. The arrows indicate the transition based on “Gyokuyou”, while the dot arrows indicate the transition based on “Hyohanki”.
The result illustrates the change in the relationship among Kiyomori, Emperor Goshirakawa and Motofusa. “Gyokuyou” is the content of the emperor side and the author of “Hyohanki” is a person on the emperor side. Therefore, the contents of two historical documents are similar and the transition of Figure 3 is also similar. The results suggest the possibility of estimating author's standpoint from the document.

The Result of Visualizing Latent Relationships using Locational Information
Using our proposed method of 2.2, we created graphs that visualize relationships between historical persons. We focused on the time range of Hougen Rebellion, starting in early July 1156 and ending in late July of the same year.

Hougen Rebellion is a short civil war caused by a power struggle between Emperor Goshirakawa and former Emperor Sutoku.

We chose 78 persons belonging to either the faction following former Emperor Sutoku or the faction following Emperor Goshirakawa (Hyohanki Reading Circle, 2007). Most of them are aristocrats and samurai warriors. It is distinguishable from historical records to which faction each person belonged to. In “Hyohanki”, 31 of these persons had co-occurrence with location names. We used K = 3 for K-means clustering and L = 20 for initialization. K is the number of clusters used in K-means clustering. L is the number of repetitions for finding the optimal initial centroids.

Figure 4 shows the result of visualization using the similarity of co-occurring location names.

Fig. 4. Relationships among persons and historical factions

Full Size Image

The number next to the node and the color indicates to which faction each person belonged to. A node labeled “1” (box) indicates that he followed Sutoku. On the other hand, a node labeled “2” (circle) indicates that he followed Goshirakawa. Lines are drawn when similarity is over 0.4. Dotted lines indicate similarity between 0.4 and 0.7. Solid lines indicate similarity over 0.7.
Figure 5 shows the result of clustering. The number next to each node indicates to which cluster the person was allocated to.

Fig. 5. Result of clustering persons using locational information

Full Size Image

Colors also indicate clusters. The cluster 1 is box, the cluster 2 is circle, and the cluster 3 is triangle. Table 1 shows to which cluster and to which faction each person belonged to.
Table. 1. Comparisons of factions and clusters

Full Size Image

We proposed a method of revealing and visualizing relationships among historical persons by focusing on place names appearing in digitized historical documents. We used cosine similarity and a modified K-means algorithm to create graphs and cluster persons.

In the experiments, we used persons that we know to which faction he belonged to during the Hougen Rebellion. The result showed a strong correspondence between the factions and the clusters, indicating effectiveness of using location information for clustering people.

Conclusion
The results of our experiments showed that relationships between historical persons can be extracted using historical documents. The experiment described in Subsection 3.1 showed that temporal change of a relationship can be visualized using change in co-occurrence frequencies. The experiment described in Subsection 3.2 indicated a strong correspondence between the factions and the clusters, indicating effectiveness of using location information for clustering people.

References:
Independent Administrative Institution National Museum of Art 2010 Union Catalog of the Collections of the National Art Museums (link) 14 March 2011

Crane,G.R. 2011 Perseus Digital Library (link) 14 March 2011

Fukuda,T. 2002 Azumakagami-Gyokuyou Database

Tsunoda,B. 1994 The Outline of Heiankyo

Hayashiya,T. Murai,Y. Moriya,K. 1979 Japan’s Historical Place Names 27: Place Names of Kyoto

Sakai,M. Yamada, S. Onoda,T. 2010 Initialization of k-means method using independent component analysis The 24th annual meeting of the Japanese Society for Artificial Intelligence

Hyohanki Reading Circle 2007 The Index of Hyohanki’s Persons’ Names

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011
"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None