Department of Atmospheric Sciences - National Central University, Department of Computer Science and Information Engineering - National Central University
Department of Computer Science and Information Engineering - National Central University
Research Center for Humanities and Social Sciences - Academia Sinica
Dharma Drum Institute of Liberal Arts
Research Center for Humanities and Social Sciences - Academia Sinica, Department of Computer Science and Information Engineering - National Central University
Institute of History and Philology - Academia Sinica, Research Center for Humanities and Social Sciences - Academia Sinica
Introduction
The impact of climate change has become more and more obvious. How to understand its cause and effect to find a way to deal with it has become an important research topic. To trace the occurrence and impact of climate disasters, many clues can be found in the rich records left by historical materials. "China's Three Thousand Years of Meteorological Records" (2004) extracts meteorological descriptions from 8,228 historical sources and organizes these descriptions by regions and dates. This important work contributes to the analysis of the temporal and spatial characteristics of meteorological phenomena. The "East Asian Historical Climate Database"(P.-K. Wang et al., 2018) is compiled based on chorographies and official histories. This study develops an event classification method based on the meteorological records in the early Qing Dynasty (1644-1795) in this database. By representing classical Chinese texts into word embedding vectors and the k-means algorithm, we overcome the difficulty of analyzing classical Chinese and not having enough training data. We then integrate the classification results with the map and timeline to develop a Spatio-Temporal search interface, which facilitates climatologist to access and analyze data according to the three dimensions of time, area and event categories.
Methodology
The main task of this study is the meteorological text classification, which is composed of three steps: preprocessing, text representation generation, and k-means clustering. We use 36,123 historical meteorological descriptions in the "East Asian Historical Climate Database" as the input data. As shown in Table 1, each record contains six fields.
To make meteorological data more suitable for machine learning semantics and classification of meteorological events, we first pre-process meteorological data, including (1) replacing GanZhi, year, month and number with
G,
Y,
M, and
N, respectively (2) removing place names and punctuation symbols because they have nothing to do with classification.
Then, we use the word2vec algorithm to convert each meteorological record written in classical Chinese into 200-dimensional embedding vectors and then use the k-means algorithm to divide all embedded vectors into k groups. We use the validation set to find the most suitable k value of 45, and evaluate the clustering results with the labeled 9,530 records. The overall accuracy is 82%. Table 2 details the 45 clusters.
From Table 1, we can see that many clusters contain the same climatic events, but after careful examination, these groups are still slightly different. Taking clusters 2 (Table 3), 29 (Table 4) and 37 (Table 5) as examples, although all three clusters can be roughly classified as flood hazards, it is found that most of the texts of cluster 37 refer to seasons. The records of cluster 29 mostly refer to building damage, while the records of the other two groups do not.
Table 1. The table of the East Asian Historical Climate Database
ID
year
province
county/
city
meteorological description
source
1795-11
1663
Anhui Province
Guichi County
秋,池州大水,城井行舟。【鄉父老云:與明萬曆三十六年水相似。】
Flood in Chi-zhou in autumn. Villagers said, "The flood is similar to the one happened in Wanli period in Ming Dynasty (1608)."
Volume twenty-nine of "Chi-zhou Local Gazetteers", published in Kang-xi period (1711) in Qing Dynasty
1795-12
1663
Anhui Province
Huangshan City
七月望,邑令陳恭備建學官,是日雷雨大作,千山如注,田間水數尺,忽浮飛木於縣東門。
In mid of July when Chen Gong, the county magistrate, was going to build a school, a thunderstorm happened. Flood was a few feets high, and floodwoods flew at the east gate of the county.
Volume eight of “Tai-ping County record”, published in Chia-ching period in Qing Dynasty
1795-13
1663
Anhui Province
Shitai County
秋大水,父老云:與萬曆三十六年水勢相似。
Old man commented about the autumn flood: “It was said to be similar to the one happened in the Wanli period in Ming Dynasty (1608).”
Volume two of "Shi-di Local Gazetteers", published in Kang-xi period in Qing Dynasty
1795-14
1663
Anhui Province
Dongzhi County
秋大水,十一月始退。
The autumn flood began to recede in November.
Volume seven of "Dong-Liu Local Gazetteers", published in Qian-long period in Qing Dynasty
1795-15
1663
Jiangxi Province
Jiujiang
大水,潰堤數處,禾黍盡沒。
Flood broke levees and crops were all be drowned.
Volume fifty-three of "De-Hwa Local Gazetteers", published in Tong-Zhi period in Qing Dynasty
1795-16
1663
Jiangxi Province
Ruichang County
秋八月,大水入城。
Flood crashed the city in August t
Volume one of "Rui-Chang Local Gazetteers", published in Kang-xi period in Qing Dynasty
Table 2. The 45 clusters and their semantics
Cluster_0:
Crop Failure
Cluster_15:
Abnormal Animal Behavior
Cluster_30:
Social Problem
Cluster_1:
Wind
Cluster_16:
Drought
Cluster_31:
Rain
Cluster_2:
Flood
Cluster_17:
Social Problem
Cluster_32:
Famine
Cluster_3:
Crop Failure
Cluster_18:
Crop Failure
Cluster_33:
Flood
Cluster_4:
Insect
Cluster_19:
Crop Failure
Cluster_34:
Crop Failure
Cluster_5:
Rain
Cluster_20:
Drought
Cluster_35:
Crop Failure
Cluster_6:
Flood
Cluster_21:
Undetermined
Cluster_36:
Rain
Cluster_7:
Famine
Cluster_22:
Social Problem
Cluster_37:
Flood
Cluster_8:
Social Problem
Cluster_23:
Famine
Cluster_38:
Undetermined
Cluster_9:
Social Problem
Cluster_24:
Rain
Cluster_39:
Flood
Cluster_10:
flood Cluster
Cluster_25:
Disease
Cluster_40:
Drought
Cluster_11:
Flood
Cluster_26:
Crop Failure
Cluster_41:
Rain
Cluster_12:
Flood
Cluster_27:
Insect
Cluster_42:
Crop Failure
Cluster_13:
Crop Failure
Cluster_28:
Wind
Cluster_43:
Rain
Cluster_14:
Social Problem
Cluster_29:
Flood
Cluster_44:
Crop Failure
Table 3. Examples of records in Cluster 2.
廣東省,大水
Guangdong Province, Great Flood.
安徽省,大水
Anhui Province, Great Flood.
河北省,大水
Hebei Province, Great Flood.
福建省,六月,大水
Fujian Province, June, Great Flood.
江西省,七月,大水
Jiangxi Province, July, Great Flood.
江西省,七月十五,大水
Jiangxi Province, 15th July, Great Flood.
江西省,七月,大水
Jiangxi Province, July, Great Flood.
湖南省,五月,大水
Hunan Province, May, Great Flood.
湖南省,五月,大水
Hunan Province, May, Great Flood.
Table 4. Examples of records in Cluster 29.
貴州省,四月十二日,大水,壞心心橋欄杆。
Guizhou Province, 12th April, Great Flood, The railing of the bridge was broken.
福建省,大水,東橋折水至西門橋頭十字街。
Fujian Province, Great flood, The folding water was from the east bridge to Ximen Bridge Crossroad.
福建省,五月初十日,南鄉大水入城。
Fujian Province, 10th May, Nanxiang water flooded into the city.
廣東省,夏五月,大水,城垣崩陷八處,共六十餘丈。
Guangdong Province, in summer and May, Great flood, The eight places of the city, total 60 feet, collapsed.
江西省,秋八月,大水,蘆溪市宗濂橋圮。
Jiangxi Province, in autumn and August, Great flood, the Zonglian Bridge collapsed in Luxi City.
甘肅省,霪雨,塌傾城垣。
Gansu Province, the city walls collapsed after a long period of rain.
江蘇省,泗州城陷沒,漕堤決。
Jiangsu Province, the city of Suzhou collapsed, the dike was broken.
廣東省,夏五月,大水,秋溪鯉魚溝堤潰。
Guangdong Province, in summer and May, Great flood, Qiuxi carp ditch dike burst.
江西省,秋七月,大水連漲七次,西溪橋鷹咀石俱沖圮。
Jiangxi Province, in autumn and July, floods rose seven times, and Yingzui Stone of Xixi Bridge collapsed.
廣東省,大水,城北平地可通舟。
Guangdong Province, Great floods, the city flat land was navigable.
河北省,堤下口決。
Hebei Province, Breach of the embankment.
Table 5. Examples of records in Cluster 37.
浙江省,秋大水。
Zhejiang Province, Autumn, Great flood.
安徽省,夏秋水。
Anhui Province, Summer and Autumn, Great flood.
湖南省,秋大水。
Hunan Province, Autumn, Great flood.
安徽省,秋大水。
Anhui Province, Autumn, Great flood.
江蘇省,秋大水。
Jiangsu Province, Autumn, Great flood.
湖南省,夏水。
Hunan Province, Summer, Great flood
浙江省,夏秋大水。
Zhejiang Province, Summer and Autumn, Great flood.
浙江省,秋大水。
Zhejiang Province, Autumn, Great flood.
江西省,春大水。
Jiangxi Province, Spring, Great flood.
Front-end interface
Then, we use the "Time and Space Infrastructure of Chinese Civilization" (CCTS) to present meteorological events on the map interface based on the year and location of the climate events in the historical meteorological records, providing an interface for researchers (see Figure 1). Our system is located at
http://iisrserv.csie.ncu.edu.tw:5000/English. The main features of the interface include a scrolling timeline, a pop-up condition selection window, and an instant response map. When the user selects the conditions, the map will immediately display the records satisfying the conditions. If the cursor is moved over the location on the map, the page below the map will show the meteorological records of the location in the timeline interval.
Figure 1. System interface.
Case Study
To show the usage of our system, we look into the meteorological records during 1650 to 1700, which is the late stage of the Little Ice Age, to investigate the phenomenon of climate change in Qing Dynasty of China.
First, we choose the “temperature” event category. Among these “temperature” records, there are 68 extreme cold climate records. We can see that this kind of phenomenon was located from tropical zone to tepid zone in China (see Figure 2), therefore, we can conclude that extreme cold records appear not only in middle- to high-latitude areas, but also in lower latitude area.
Figure 2. Areas with Low-Temperature Record from 1650 to 1700.
Extreme cold weather usually comes with disasters. We choose “rain” as our variables as step two. Among the records, there are 379 records related to “snow.” To deeply look into the records, besides directly meteorological phenomena of “snow,” we further collect the indirectly meteorological phenomena data, such as “three days in a row,” “frost-damaged trees,” ...etc (see Table 6). From Table 6, we can see that during the Little Ice Age (1650-1700), the extremely cold climate led to disasters is more frequent than that during Non-Little Ice Age (1745-1795).
Table 6. Numbers of Data related to “snow” during Little Ice Age and Non-Little Ice Age in Cluster “rain.”
Meteorological Phenomena Data
Little Ice Age
(1650-1700)
Non-Little Ice Age
(
1745-1795
)
Rain
3,014
1,618
Snow
379
119
Snow, more than 10 days /month /ten days in a row three days in a row or more
46
5
Snow, frost damaged trees
8
2
Snow, birds, animals and human beings freeze to death
17
6
When investigating the meteorological phenomena during Little Ice Age, we find out rare snow records in Taiwan. As shown in Table 7, these records are in terrain area, such as Chia-Yi county and Tainan City, which could be seen as a strong evidence of the extreme climate during Little Ice Age.
Table 7. Rare snow records in Taiwan.
year
province
county/
city
cluster
meteorological description
data sources
1683
Taiwan Province
--
Temperature (including frost and dew), and rain
冬十一月,雨雪。是夜冰堅厚寸餘。從來臺灣無雪無冰,此異事也。
In November, winter, it snows and rains. The ice is more than an inch thick that night. It's strange because there has not snowed since ancient times.
Volume ten of "Taiwan Local Gazetteers", published in Kang-xi period (1685) in Qing Dynasty
1683
Taiwan Province
--
Rain, crop failure, temperature (including frost and dew)
五月,大雨。霪雨連月,鄭氏之土田阡陌多被沖陷,有“高岸為谷”之歎。冬始雨雪,冰堅厚寸餘。臺土氣熱,從無霜雪。
In May, it rained heavily. After a long period of rain, the fields of Zheng's family were crashed, and the high bank became valley. It began to snow in winter, and the ice was more than an inch thick. Normally the field was hot, and there was no frost and snow in Taiwan.
Volume nine of "Rebuilt Taiwan Local Gazetteers", published in Kang-xi period in Qing Dynasty
1683
Taiwan Province
Chia-yi County
Flood, rain, crop failure, temperature (including frost and dew)
夏五月,大雨水,時霪雨連月,鄭氏土田多沖陷,有“高岸為谷”之歎。冬十一月,始雨雪,冰堅厚寸餘。諸羅有霜無雪,是歲甫入版圖,地氣自北而南,信有矣。
In May, it rained heavily. After a long period of rain, the fields of Zheng's family were crashed, and the high bank became valley. In November, Winter, it snowed and iced over. Normally the field was hot, and there was no frost and snow in Tsu-lo (ancient name of Taiwan). It was believed that the climate was from Northern area, starting from the year Tso-lo became Qing’s territory.
Volume twelve of "Tsu-lo Local Gazetteers", published in Kang-xi period in Qing Dynasty
1683
Taiwan Province
Tainan City
Flood, rain, flood, rain, crop failure, temperature (frost and dew), temperature (frost and dew), temperature (frost and dew), drought, rain
春,鯽魚潭涸。夏五月,大雨水,田園多沖陷。六月,澎湖潮水漲四尺。秋八月壬子,鹿耳門潮水漲。冬十有一月,雨雪冰。是臺地氣暖,從無霜雪,是歲八月甫入版圖,冬遂雨雪,冰堅寸許,地氣自北而南,運屬一統故也。
In Spring, crucian pond dried up. In May, Summer, it rained a lot, and fields are mostly crashed. In June, the tide at Penghu rose four feet high. On August 18th, the tide at Luermen rose. In November, Winter, it snowed and iced over. Normally the field was hot, and there was no frost and snow in Taiwan. However, started from August when Taiwan was under Qing’s authority, it rained and snowed in Winter, and the ice was more than an inch thick. The climate was from the Northern Area. It is because of territorial unity.
"Rebuilt Taiwan Local Gazetteers", published in Qian-long period in Qing Dynasty
Conclusion
Although technology has improved nowadays, it is still hard to predict the weather. In order to understand the impact of climate disasters and find a way to deal with it, tracing the historical climate event could be a solution. This study establishes a principle to classify meteorological phenomena based on "China's Three Thousand Years of Meteorological Records", develops a Spatio-Temporal research platform, and build an instant response front-end interface. By the climate case study, we collect the users’ feedback and improve the front-end interface, as well as enhance the precision of a mass of meteorological data analytics. Although the research platform only contains the meteorological data from 1647 to 1795, we hope to expand the capacity of the database and establish a mature Spatio-Temporal research platform in the future.
Bibliography
Chinea-Rios, M., Sanchis-Trilles, G. and Casacuberta, F. (2015). Sentence clustering using continuous vector space representation. Springer, pp. 432–40.
Dingding Wang, Tao Li, Shenghuo Zhu and Chris Ding (2008). Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore.
Gang Qian, Shamik Sural, Yuelong Gu and Sakti Pramanik (2004). Similarity between Euclidean and cosine angle distance for nearest neighbor queries Paper presented at the Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus.
K. M. Hammouda and M. S. Kamel (2004). Efficient phrase-based document indexing for Web document clustering.
IEEE Transactions on Knowledge and Data Engineering,
16(10): 1279–96 doi:
10.1109/TKDE.2004.58.
Ko-Chen, C. (1973). A preliminary study on the climatic fluctuations during the last 5,000 years in China.
Scientia Sinica,
16(2): 226–56.
Lili Kotlerman, Ido Dagan, Maya Gorodetsky and Ezra Daya (2012). Sentence clustering via projection over term clusters Paper presented at the Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Montréal, Canada.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. (Fifth Berkeley Symposium on Mathematical Statistics and Probability). University of California Press, pp. 281–97.
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space.
ArXiv Preprint ArXiv:1301.3781.
Solomon, S., Qin, D., Manning, M., Averyt, K. and Marquis, M. (2007).
Climate Change 2007-the Physical Science Basis: Working Group I Contribution to the Fourth Assessment Report of the IPCC. . Vol. 4. Cambridge university press.
Wang, P.-K., Lin, K., Liao, Y. C., Liao, H. M., Lin, Y. S., Hsu, C. T., Hsu, S. M., et al. (2018). Construction of the REACHES Climate Database Based on Historical Documents of China. Scientific Data, in press.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at Utrecht University
Utrecht, Netherlands
July 9, 2019 - July 12, 2019
436 works by 1162 authors indexed
Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html
References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html
Series: ADHO (14)
Organizers: ADHO