Department of Atmospheric Sciences - National Central University, Department of Computer Science and Information Engineering - National Central University
Department of Computer Science and Information Engineering - National Central University
Research Center for Humanities and Social Sciences - Academia Sinica
Dharma Drum Institute of Liberal Arts
Research Center for Humanities and Social Sciences - Academia Sinica, Department of Computer Science and Information Engineering - National Central University
Institute of History and Philology - Academia Sinica, Research Center for Humanities and Social Sciences - Academia Sinica
The impact of climate change has become more and more obvious. How to understand its cause and effect to find a way to deal with it has become an important research topic. To trace the occurrence and impact of climate disasters, many clues can be found in the rich records left by historical materials. "China's Three Thousand Years of Meteorological Records" (2004) extracts meteorological descriptions from 8,228 historical sources and organizes these descriptions by regions and dates. This important work contributes to the analysis of the temporal and spatial characteristics of meteorological phenomena. The "East Asian Historical Climate Database"(P.-K. Wang et al., 2018) is compiled based on chorographies and official histories. This study develops an event classification method based on the meteorological records in the early Qing Dynasty (1644-1795) in this database. By representing classical Chinese texts into word embedding vectors and the k-means algorithm, we overcome the difficulty of analyzing classical Chinese and not having enough training data. We then integrate the classification results with the map and timeline to develop a Spatio-Temporal search interface, which facilitates climatologist to access and analyze data according to the three dimensions of time, area and event categories.
The main task of this study is the meteorological text classification, which is composed of three steps: preprocessing, text representation generation, and k-means clustering. We use 36,123 historical meteorological descriptions in the "East Asian Historical Climate Database" as the input data. As shown in Table 1, each record contains six fields.
To make meteorological data more suitable for machine learning semantics and classification of meteorological events, we first pre-process meteorological data, including (1) replacing GanZhi, year, month and number with
N, respectively (2) removing place names and punctuation symbols because they have nothing to do with classification.
Then, we use the word2vec algorithm to convert each meteorological record written in classical Chinese into 200-dimensional embedding vectors and then use the k-means algorithm to divide all embedded vectors into k groups. We use the validation set to find the most suitable k value of 45, and evaluate the clustering results with the labeled 9,530 records. The overall accuracy is 82%. Table 2 details the 45 clusters.
From Table 1, we can see that many clusters contain the same climatic events, but after careful examination, these groups are still slightly different. Taking clusters 2 (Table 3), 29 (Table 4) and 37 (Table 5) as examples, although all three clusters can be roughly classified as flood hazards, it is found that most of the texts of cluster 37 refer to seasons. The records of cluster 29 mostly refer to building damage, while the records of the other two groups do not.
Table 1. The table of the East Asian Historical Climate Database
Flood in Chi-zhou in autumn. Villagers said, "The flood is similar to the one happened in Wanli period in Ming Dynasty (1608)."
Volume twenty-nine of "Chi-zhou Local Gazetteers", published in Kang-xi period (1711) in Qing Dynasty
In mid of July when Chen Gong, the county magistrate, was going to build a school, a thunderstorm happened. Flood was a few feets high, and floodwoods flew at the east gate of the county.
Volume eight of “Tai-ping County record”, published in Chia-ching period in Qing Dynasty
Old man commented about the autumn flood: “It was said to be similar to the one happened in the Wanli period in Ming Dynasty (1608).”
Volume two of "Shi-di Local Gazetteers", published in Kang-xi period in Qing Dynasty
The autumn flood began to recede in November.
Volume seven of "Dong-Liu Local Gazetteers", published in Qian-long period in Qing Dynasty
Flood broke levees and crops were all be drowned.
Volume fifty-three of "De-Hwa Local Gazetteers", published in Tong-Zhi period in Qing Dynasty
Flood crashed the city in August t
Volume one of "Rui-Chang Local Gazetteers", published in Kang-xi period in Qing Dynasty
Table 2. The 45 clusters and their semantics
Abnormal Animal Behavior
Table 3. Examples of records in Cluster 2.
Guangdong Province, Great Flood.
Anhui Province, Great Flood.
Hebei Province, Great Flood.
Fujian Province, June, Great Flood.
Jiangxi Province, July, Great Flood.
Jiangxi Province, 15th July, Great Flood.
Jiangxi Province, July, Great Flood.
Hunan Province, May, Great Flood.
Hunan Province, May, Great Flood.
Table 4. Examples of records in Cluster 29.
Guizhou Province, 12th April, Great Flood, The railing of the bridge was broken.
Fujian Province, Great flood, The folding water was from the east bridge to Ximen Bridge Crossroad.
Fujian Province, 10th May, Nanxiang water flooded into the city.
Guangdong Province, in summer and May, Great flood, The eight places of the city, total 60 feet, collapsed.
Jiangxi Province, in autumn and August, Great flood, the Zonglian Bridge collapsed in Luxi City.
Gansu Province, the city walls collapsed after a long period of rain.
Jiangsu Province, the city of Suzhou collapsed, the dike was broken.
Guangdong Province, in summer and May, Great flood, Qiuxi carp ditch dike burst.
Jiangxi Province, in autumn and July, floods rose seven times, and Yingzui Stone of Xixi Bridge collapsed.
Guangdong Province, Great floods, the city flat land was navigable.
Hebei Province, Breach of the embankment.
Table 5. Examples of records in Cluster 37.
Zhejiang Province, Autumn, Great flood.
Anhui Province, Summer and Autumn, Great flood.
Hunan Province, Autumn, Great flood.
Anhui Province, Autumn, Great flood.
Jiangsu Province, Autumn, Great flood.
Hunan Province, Summer, Great flood
Zhejiang Province, Summer and Autumn, Great flood.
Zhejiang Province, Autumn, Great flood.
Jiangxi Province, Spring, Great flood.
Then, we use the "Time and Space Infrastructure of Chinese Civilization" (CCTS) to present meteorological events on the map interface based on the year and location of the climate events in the historical meteorological records, providing an interface for researchers (see Figure 1). Our system is located at
http://iisrserv.csie.ncu.edu.tw:5000/English. The main features of the interface include a scrolling timeline, a pop-up condition selection window, and an instant response map. When the user selects the conditions, the map will immediately display the records satisfying the conditions. If the cursor is moved over the location on the map, the page below the map will show the meteorological records of the location in the timeline interval.
Figure 1. System interface.
To show the usage of our system, we look into the meteorological records during 1650 to 1700, which is the late stage of the Little Ice Age, to investigate the phenomenon of climate change in Qing Dynasty of China.
First, we choose the “temperature” event category. Among these “temperature” records, there are 68 extreme cold climate records. We can see that this kind of phenomenon was located from tropical zone to tepid zone in China (see Figure 2), therefore, we can conclude that extreme cold records appear not only in middle- to high-latitude areas, but also in lower latitude area.
Figure 2. Areas with Low-Temperature Record from 1650 to 1700.
Extreme cold weather usually comes with disasters. We choose “rain” as our variables as step two. Among the records, there are 379 records related to “snow.” To deeply look into the records, besides directly meteorological phenomena of “snow,” we further collect the indirectly meteorological phenomena data, such as “three days in a row,” “frost-damaged trees,” ...etc (see Table 6). From Table 6, we can see that during the Little Ice Age (1650-1700), the extremely cold climate led to disasters is more frequent than that during Non-Little Ice Age (1745-1795).
Table 6. Numbers of Data related to “snow” during Little Ice Age and Non-Little Ice Age in Cluster “rain.”
Meteorological Phenomena Data
Little Ice Age
Non-Little Ice Age
Snow, more than 10 days /month /ten days in a row three days in a row or more
Snow, frost damaged trees
Snow, birds, animals and human beings freeze to death
When investigating the meteorological phenomena during Little Ice Age, we find out rare snow records in Taiwan. As shown in Table 7, these records are in terrain area, such as Chia-Yi county and Tainan City, which could be seen as a strong evidence of the extreme climate during Little Ice Age.
Table 7. Rare snow records in Taiwan.
Temperature (including frost and dew), and rain
In November, winter, it snows and rains. The ice is more than an inch thick that night. It's strange because there has not snowed since ancient times.
Volume ten of "Taiwan Local Gazetteers", published in Kang-xi period (1685) in Qing Dynasty
Rain, crop failure, temperature (including frost and dew)
In May, it rained heavily. After a long period of rain, the fields of Zheng's family were crashed, and the high bank became valley. It began to snow in winter, and the ice was more than an inch thick. Normally the field was hot, and there was no frost and snow in Taiwan.
Volume nine of "Rebuilt Taiwan Local Gazetteers", published in Kang-xi period in Qing Dynasty
Flood, rain, crop failure, temperature (including frost and dew)
In May, it rained heavily. After a long period of rain, the fields of Zheng's family were crashed, and the high bank became valley. In November, Winter, it snowed and iced over. Normally the field was hot, and there was no frost and snow in Tsu-lo (ancient name of Taiwan). It was believed that the climate was from Northern area, starting from the year Tso-lo became Qing’s territory.
Volume twelve of "Tsu-lo Local Gazetteers", published in Kang-xi period in Qing Dynasty
Flood, rain, flood, rain, crop failure, temperature (frost and dew), temperature (frost and dew), temperature (frost and dew), drought, rain
In Spring, crucian pond dried up. In May, Summer, it rained a lot, and fields are mostly crashed. In June, the tide at Penghu rose four feet high. On August 18th, the tide at Luermen rose. In November, Winter, it snowed and iced over. Normally the field was hot, and there was no frost and snow in Taiwan. However, started from August when Taiwan was under Qing’s authority, it rained and snowed in Winter, and the ice was more than an inch thick. The climate was from the Northern Area. It is because of territorial unity.
"Rebuilt Taiwan Local Gazetteers", published in Qian-long period in Qing Dynasty
Although technology has improved nowadays, it is still hard to predict the weather. In order to understand the impact of climate disasters and find a way to deal with it, tracing the historical climate event could be a solution. This study establishes a principle to classify meteorological phenomena based on "China's Three Thousand Years of Meteorological Records", develops a Spatio-Temporal research platform, and build an instant response front-end interface. By the climate case study, we collect the users’ feedback and improve the front-end interface, as well as enhance the precision of a mass of meteorological data analytics. Although the research platform only contains the meteorological data from 1647 to 1795, we hope to expand the capacity of the database and establish a mature Spatio-Temporal research platform in the future.
Chinea-Rios, M., Sanchis-Trilles, G. and Casacuberta, F. (2015). Sentence clustering using continuous vector space representation. Springer, pp. 432–40.
Dingding Wang, Tao Li, Shenghuo Zhu and Chris Ding (2008). Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore.
Gang Qian, Shamik Sural, Yuelong Gu and Sakti Pramanik (2004). Similarity between Euclidean and cosine angle distance for nearest neighbor queries Paper presented at the Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus.
K. M. Hammouda and M. S. Kamel (2004). Efficient phrase-based document indexing for Web document clustering.
IEEE Transactions on Knowledge and Data Engineering,
16(10): 1279–96 doi:
Ko-Chen, C. (1973). A preliminary study on the climatic fluctuations during the last 5,000 years in China.
Lili Kotlerman, Ido Dagan, Maya Gorodetsky and Ezra Daya (2012). Sentence clustering via projection over term clusters Paper presented at the Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Montr&eacute;al, Canada.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. (Fifth Berkeley Symposium on Mathematical Statistics and Probability). University of California Press, pp. 281–97.
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space.
ArXiv Preprint ArXiv:1301.3781.
Solomon, S., Qin, D., Manning, M., Averyt, K. and Marquis, M. (2007).
Climate Change 2007-the Physical Science Basis: Working Group I Contribution to the Fourth Assessment Report of the IPCC. . Vol. 4. Cambridge university press.
Wang, P.-K., Lin, K., Liao, Y. C., Liao, H. M., Lin, Y. S., Hsu, C. T., Hsu, S. M., et al. (2018). Construction of the REACHES Climate Database Based on Historical Documents of China. Scientific Data, in press.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Utrecht University
July 9, 2019 - July 12, 2019
436 works by 1162 authors indexed
Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html
Series: ADHO (14)