Analysis and Visualization of Narrative in Shanhaijing Using Linked Data

paper, specified "long paper"
  1. 1. Qian Wang

    Centre for Digital Humanities Research - Australian National University

  2. 2. Terhi Nurmikko-Fuller

    Centre for Digital Humanities Research - Australian National University

  3. 3. Ben Swift

    College of Engineering and Computer Science - Australian National University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Existing Linked Data (LD) literature contains several examples (such as those based on Propp’s
Morphology of the Folktale (Peinado et al., 2004) or on ancient Sumerian mythologies (Nurmikko-Fuller, 2014)) where the fabula and syuzhet of Western folktales have been represented, and information regarding the stories themselves have been published in machine-readable formats such as RDF. However, similar (linked) datasets and analyses are largely non-existent for equivalent stories from Chinese mythology. This paper seeks to bridge that gap by creating, analyzing and publishing a case study example—the classic
Shanhaijing (
the Classic of Mountains and Seas
山海经). We recount the complexities of representing ancient Chinese literary narratives, captured in unstructured data, using tools developed from Western perspectives and for complete and largely homogeneous, highly-structured data.

Shanhaijing is an ancient encyclopedia. Its origins can be traced back to the pre-Qin period of China (4th century BC), its development continuing through to the early Han Dynasty (1st century AD). It covers broad areas such as ancient mythology, geography, witchcraft, religion, medicine, and other aspects (Hu, 2003).
Shanhaijing occupies a significant position in the literary and mythological corpora of the East, and is representative of a wider spectrum of Eastern mythologies. Over thousands of years, numerous Chinese novels, literary fictions and dramas have been derived from the book, such as
Zhuangzi (
庄子) (Zhuangzi and Palmer, 1996) and
Strange Tales of Liaozhai (
聊斋志异) (Pu et al., 1995). Mythologies from other Asian countries were influenced by it, e.g.
Kaiki Choju Zukan (Wang, 2018) and
Hyakki Yagyō (Tanaka, 2007), both examples of Japanese folklore.

In this paper, we report on the state of existing work combining LD methodologies and approaches with literary compositions (Section II), and summarize the narrative of the
Shanhaijing (Section III). We outline our chosen methodology in Section IV. The custom-built user-interface (UI) is demonstrated in Section V, and we conclude the paper with a discussion of the complexities of the process in Section VI.

Related work
In the domain of literature, several publicly available datasets have been published as LD. For example, the Book-Sampo project (Mäkelä et al., 2011), which provides information on fiction literature published in Finland going back to the 15th century, alongside rich descriptions of both content and context (Mäkelä et al., 2013), or the Perseids Project (Almas and Beaulieu, 2013), which provides a platform for creating, publishing, and sharing research data, in the form of textual transcriptions, annotations and analysis (Almas, 2017). Other essential work in this space include the Brothers Grimm project (Franzini et al., 2016), an ontology for Greek mythology (Syamili and Rekha, 2018), and the Aarne-Thompson’s Motif-Index project (Declerck et al., 2017).

Classic of mountains and seas, 山海经
The present version of
Shanhaijing contains 18 chapters, approximately 31,000 words in total. It records ancient Chinese mythologies, where numerous monsters with fanciful descriptions are portrayed as possessing magical powers or as related to ancestor (totem) worship (Li and Chan, 2012), such as the monster Lushu (
鹿蜀) , which looks like a horse with a white head, a scarlet tail and tiger’s markings, and lives on Mount Niuyang (纽阳之山). Whoever wears its fur will have a greater number of descendants.

We focused exclusively on the capture of the data for the monsters in
Shanhaijing. The reason for this is that the fascinating and detailed accounts of these creatures overwhelm the other aspects of the story; and these descriptions account for a notable proportion of instances of literary borrowings and inspirations in other cultures, increasing the likelihood of reuse and inter-linking with other ontologies.

The first stage of the project focused exclusively on the cornucopia of monsters (a total of 277). Through a close reading in both English and classical Chinese, we extracted structured data from the unstructured narrative.
In the second stage, we designed an ontological structure to model the domain (Pan et al., 2017). After considering several pre-existing ontology software libraries, we concluded that the suitability of these vocabularies and resources for the representation of
Shanhaijing was limited. This necessitated the building of a new ontology, which captures data types and represents the relationships between them as .TTL file.

The ontology represents the characteristics of the monsters and the complex relationships between them. It contains a taxonomy of body parts, the characteristics, and the habitats of all monsters. We used a combination of top-down and bottom-up ontologies and schemas to identify Class and Property hierarchies, captured through
rdfs:subClassOf and
rdfs:subPropertyOf and reused existing vocabularies such as DBPedia Ontology, BioTopOntology (Whetzel et al., 2011), Mahabarata Ontology, RDFS, and XML Schema. This approach enabled us to capture the specifics of the data (bottom-up approach), but also maximize the benefit from other well-developed and rich ontologies.

All concepts related to monsters were collected, then split into terms. A term is considered as a Class if it has attributes pointing to other classes or literals, or it is a Superclass of other classes. Otherwise, it is defined as a property. For example, the term
Monster is defined as a Class because it has attributes linking it to other Classes, such as
Mountain, through the property
livesIn. However, term
Noise is not a Class because it is not the domain of any attribute. Hence, it is considered as a property
hasSameNoiseAs with
Monster as its domain, and a literal as its range. The class
Monster is defined as a subclass of
Character in FRBRoo, allowing the use of FRBRoo to represent the relations of works. Figure 1 demonstrates a graphic version of the Shanhaijing Ontology.

Figure 1: The Shanhaijing Ontology

Instance-level data was normalized by mapping it to the ontology, and the Silk-Link Discovery Framework (Volz et al., 2009) was used to automatically link appropriately matched resources to external datasets (DBpedia, Wikidata and using
owl:equivalentClass and

An interactive data explorer software tool (
iSHJ) was built for visualizing, querying (through a SPARQL interface) and analyzing the data. The dataset, ontology, and source code are available via GitHub (

User interface

iSHJ was built as a domain specific application for
Shanhaijing. This tool has “Browse”, “Search” and “Visualization” interaction modes. Users are provided with quick search functions to explore the data by clicking buttons and inputting keywords rather than writing SPARQL queries directly, although they can be when more flexible and variable searches are needed (see Figure 2). The results are displayed both in plain text and charts for visualization (see Figure 3). We also provide a graphical version of
Shanhaijing, where mountains are placed to represent the locations described in the book, the monsters correspondingly placed on the mountains where they live. A video of the UI is available at YouTube (

Figure 2: Results of Quick Search Example "monster" in "Browse" Section in

Figure 3: SPARQL Query Sample of Monster's Tail Number with Visualization Results in

Despite the work on the narrative of separate regions of some prominent Western myths, projects focused on Chinese literature within this interdisciplinary field are rare. Existing LD methods have been developed almost exclusively in the context of Western culture, and predominantly for highly structured data. When facing ancient Chinese mythologies, there are two main unsolved challenges: non-existent structured datasets and the unavailability of reusable ontologies.
Before LD methods can be applied to the narrative of Chinese myths, a structured dataset capturing in-depth knowledge of Chinese mythologies must be constructed. However, the full potential of this pioneering project can only be tapped into once a greater number of external, disparate, but complementary datasets are published using the LD paradigm. That is to say, until other projects focusing on the analysis of Chinese literature engage in LD, there are limited opportunities for outward linkage.
The protagonists of Eastern and Western mythologies are not entirely similar. For example, in many ancient Chinese mythologies (such as
Shanhaijing), numerous gods and creatures are described as a monstrous combination of different animals, falling somewhere between, for example, the human-like (both physically and emotionally) gods and heroes of Greek myths and the anthropomorphized animals of Aesop’s tales. Although there are some complementary aspects – e.g. in the Aarne-Thompson’s Motif-Index, the Nine-tailed Fox (
九 尾狐 ) (Kiyoshi, 1961; Lee, 2011; Chen, 1995), is recorded as B15.7.7.1; other motifs are the four-eyed tiger (B15.4.1.2.) and serpent with a jewel in its mouth (B103.4.2.) – These ontologies neither contain the narrative of Chinese myths nor are they created for Chinese folktales. Ultimately, the existing overlaps are insufficient.

Based on the differences in the narratives, most ontologies created for Western folktales are not completely suitable for the representation of ancient Chinese mythic classics and could not adequately demonstrate the characteristics of these gods and monsters in the

We used LD methods for textual analyses and visualization of a book of Chinese mythology,
Shanhaijing. We created and published a structured dataset, relevant LD, and an interactive explorer to represent the monsters within the text. An extensive review of existing ontologies for literary motifs and mythological creatures revealed that there was insufficient overlap between them and the needs of the dataset, necessitating the development of a new ontology.

Future work will see us expand this analysis to all the contents of
Shanhaijing. New ontologies will be generated from the one in this paper, and structures will be

redetermined and improved to adapt to other mythologies. Other ontologies could be reused or interlinked to, increasing the number of linked elements.

We will also test the suitability of our ontology on other mythologies, ranging from Chinese mythologies appearing before and after
Shanhaijing to other Asian mythologies such as Japanese tales. We will also apply our ontology to Western mythologies to assess the similarities and differences between Eastern and Western folk tales.


Almas, B.
(2017). Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities.
Data Science Journal
: 19 doi:10.5334 /dsj-2017-019.

Almas, B. and Beaulieu, M.-C.
(2013). Developing a New Integrated Editing Platform for Source Documents in Classics.
Literary and Linguistic Computing
(4): 493–503 doi:10.1093/llc/fqt046.

Chen, H.
(1995). 狐狸精原型的文化阐释 (accessed 21 April 2019).

Declerk, T., Koštová, A. and Schäfer, L.
(2017). Towards a Linked Data Access to Folktales classified by Thompson’s Motifs and Aarne-Thompson-Uther’s Types. : 3.

Franzini, E., Franzini, G., Rotari, G. and Büchler, M.
(2016). The Digital Breadcrumbs of Brothers Grimm. doi:10.13140/rg.2.1.1290.1365. RG.2.1. 1290.1365 (accessed 21 April 2019).

Hu, Y.
(2003). 纵观海内外《山海经》研究五十年. 福建师大福清分校学报,
doi:10.3969/j.issn.1008-3421.2003.z1.012. TAL-FJFQ2003S1012.htm (accessed 21 April 2019).

Kiyoshi, N.
Kitsuné: Japan’s Fox of Mystery, Romance & Humor
. Hokuseido Press.

Lee, S.-A.
(2011). Lures and Horrors of Alterity: Adapting Korean Tales of Fox Spirits.
International Research in Children’s Literature
(2): 135–50 doi:10.3366/ircl.2011. 0022.

Li, E. and Chan, Y. K.
Chinese Literature
. Singapore: Asiapac Books.

Mäkelä, E., Hypén, K. and Hyvönen, E.
(2011). BookSampo-Lessons Learned in Creating a Semantic Portal for Fiction Literature. In Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N. and Blomqvist, E. (eds),
The Semantic Web – ISWC 2011
, vol. 7032. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 173–88 doi:10.1007/978-3-642-25093-4_12. 3-4_12 (accessed 21 April 2019).

Mäkelä, E., Hypén, K. and Hyvönen, E.
(2013). Fiction Literature as Linked Open Data-the BookSampo Dataset.
Semantic Web
(3): 299–306 doi:10.3233/SW-120093.

Nurmikko-Fuller, T.
(2014). Assessing the Suitability of Existing OWL Ontologies for the Representation of Narrative Structures in Sumerian Literature.
Current Practice in Linked Open Data for the Ancient World
: ISAW Papers, 7.

Pan, J. Z., Vetere, G., Gomez-Perez, J. M. and Wu, H.
Exploiting Linked Data and Knowledge Graphs in Large Organisations
. Cham: Springer International Publishing Imprint : Springer 10.1007/978-3-319- 45654-6 (accessed 21 April 2019).

Peinado, F., Gervás, P. and Díaz-Agudo, B.
(2004). A Description Logic Ontology for Fairy Tale Generation. : 6.

Pu, S., Lu, Y., Chen, T., Yang, L. and Yang, Z.
Strange Tales of Liaozhai

Syamili, C. and Rekha, R. V.
(2018). Developing an ontology for Greek mythology.
The Electronic Library
(1): 119–32 doi:10.1108/EL-02-2017-0030.

Tanaka, T.
Hyakki yagyō emaki o yomu: zusetsu
. Shinsōban. (Fukurō no hon). Tōkyō: Kawade Shobō Shinsha.

Volz, J., Bizer, C., Gaedke, M. and Kobilarov, G.
(2009). Silk - A Link Discovery Framework for the Web of Data.

Wang, X.
(2018). 山海百灵: 《山海经》里的神人鸟兽鱼.

Whetzel, P. L., Noy, N. F., Shah, N. H., Alexander, P. R., Nyulas, C., Tudorache, T. and Musen, M. A.
(2011). BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications.
Nucleic Acids Research
(suppl): W541–45 doi:10.1093/nar/gkr469.

Zhuangzi and Palmer, M. J.
The Book of Chuang Tzu
. London: Penguin (accessed 21 April 2019).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO