The Inherited Self: Reappraising Literary Cultural Heritage through Digital Methods

Mats Ulrik Malm; Jenny Bergenmar; Dimitrios Kokkinakis; Peter Leonard

Authorship

1. Mats Ulrik Malm

Göteborg University (Gothenburg)
2. Jenny Bergenmar

Göteborg University (Gothenburg)
3. Dimitrios Kokkinakis

Göteborg University (Gothenburg)
4. Peter Leonard

Yale University

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This project proposes new approaches to cultural heritage by developing new methods of working with digital texts and by defining appropriate research questions. Our goal is to find ways of turning literature, especially prose fiction, into a site of dynamic research in the humanities and social sciences, rather than merely a passive digital repository.

Our point of departure is the view of cultural heritage as largely intended, or willed, to convey a specific collective memory and identity. This perspective in turn strongly affects the construction of individual identity. From this the project elaborates two main conclusions: 1) In order to fully understand our cultural heritage, it is essential to analyse it against the self-understanding of the cultures that produced it — evading or by-passing the structures of literary canon-formation. 2) Focussing on the issue of identity is an efficient way of developing methodology and performing analysis.

The project is designed to

Benefit from a corpus of specially-prepared material where questions of canon formation can be explored through marginalized and forgotten literary works
Develop new methods of working with the specific forms of cultural heritage embodied in electronic text databases
Develop new perspectives and methods through interdisciplinary exchange and cooperation on these text databases.
Although the primary material of the pilot project is Swedish, all parts of the project are planned to be generalizable, scalable and relevant to other literary traditions. The main material of the investigations consists of three corpora: The literary works of August Strindberg (based on recently-finalized scholarly editions), the literary works of Selma Lagerlöf (all first editions, established and proofread in collaboration with the scholarly edition), and all original Swedish prose fiction that was first published in the years 1800, 1820, 1840, 1860, 1880 and 1900.

These three corpora offer an apposite opportunity to compare and collate results: Strindberg and Lagerlöf are both canonized, fairly contemporary but entirely different authors: one male and intensely occupied with the societal issues of his day, the other female and developing her own kind of “saga” style, interested in social issues but in a more indirect way. While Strindberg and Lagerlöf belong to Sweden’s most renowned and internationally famous authors, the Swedish Prose Fiction database has been constructed in order to evade canonical selection. Comprising all publications that match the criteria, it offers ways into both mainstream works and those that have been entirely marginalised.

As this project arises from the view of culture as an issue of identity, and of cultural heritage as the performative expression of collective memory and identity, the research questions focus on issues of identity: both collective and individual. Since fiction’s main means of portraying problems and ideas is the individual character, the studies start out with the individual in order to reach conclusions also about collective identity. The research questions include issues of identity in connection to ethnicity, society, gender and consumption patterns.

The project thus explores and develops different forms of materials, techniques, methods and co-operations, which are to result in new combinations of quantitative and qualitative analysis. In particular, we aim at refining methods of “distant reading”, as once proposed by Franco Moretti (Moretti, 2005, 2006), into new approaches that focus on content and context (cf. Jockers, 2013). We use the new tool for sub-corpus topic modeling (STM) designed by Peter Leonard (Leonard and Tangherlini, 2013), which makes it possible to extract topics from a particular work and run against larger materials. We also plan to enhance topic modeling further by adding Named Entity Recognition (NER) and sentiment analysis (cf. Liu, 2010, Maas et al., 2011) to existing systems. NER has been refined, adapted and extended in connection with this project in Kokkinakis and Oelke, 2012, Oelke et al. 2012, Kokkinakis and Malm 2011, 2013; cf. Yang et al., 2011.

At the poster presentation, we will demonstrate materials and techniques on lap-tops.

References
Jockers, M. (2013).Macroanalysis: Digital Methods and Literary History. Urbana: University of Illinois Press.

Kokkinakis, D. and Malm, M. (2011). Character Profiling in 19th Century Fiction. In Workshop: Language Technologies for Digital Humanities and Cultural Heritage in conjunction with the Recent Advances in Natural Language Processing (RANLP). Hissar, pp. 70-77.

Kokkinakis, D. and Oelke, D. (2012). Men, Women and Gods: Distant Reading in Literary Collections – Combining Visual Analytics with Language Technology. In Proceedings of the Advances in Visual Methods for Linguistics (AVML). University of York.

Kokkinakis D. and Malm M. (2013). A Macroanalytic View of Swedish Literature using Topic Modeling. In Proceedings of the Corpus Linguistics. Andrew Hardie and Robbie Love (eds), Lancaster: UCREL, pp. 144-147.

Leonard, P. and Tangherlini, T. (2013). Trawling in the Sea of the Great Unread: Sub-Corpus Topic Modeling and Humanities Research. Poetics, 41: 725-749.

Liu, Bing. (2010). Sentiment Analysis and Subjectivity. In Indurkhya, N. and Damerau, F. J. (eds), Handbook of Natural Language Processing. Boca Raton, Fla: CRC Press, pp. 627-659.

Maas, A. L., Daly, R. E., Pham P. T., Huang, D., Ng, A. Y. and Potts, C. (2011). Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL). Portland, pp. 142-150.

Moretti, F. (2005). Graphs, Maps, Trees. Abstract Models for Literary History. London: Verso.

Moretti, F. ed. (2006). The Novel. History, Geography, and Culture 1-2. Princeton: Princeton University Press.

Oelke, D., Kokkinakis, D. and Malm, M. (2012). Advanced Visual Analytics Methods for Literature Analysis. In Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), an EACL Workshop. Avignon: Association for Computational Linguistics, pp. 35-44.

Yang, T., Torget, A. T. and Mihalcea, R. (2011). Topic Modeling on Historical Newspapers. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Portland: Association for Computational Linguistics, pp. 96-104.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014

"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO

The Inherited Self: Reappraising Literary Cultural Heritage through Digital Methods

1. Mats Ulrik Malm

2. Jenny Bergenmar

3. Dimitrios Kokkinakis

4. Peter Leonard

ADHO - 2014

"Digital Cultural Empowerment"