Automatic Emotion Detection for Quantitative Literary Studies: A case study based on Franz Kafka’s “Das Schloss” und “Amerika”

poster / demo / art installation
  1. 1. Roman Klinger

    Universität Stuttgart

  2. 2. Surayya Samat Suliya

    Universität Stuttgart

  3. 3. Nils Reiter

    Universität Stuttgart

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Sentiment analysis and opinion mining methods are established for automatically summarizing information shared by users in product reviews or in social media platforms like Twitter, Facebook or more specific fora (Liu 2015). These approaches can be categorized into coarse-grained and fine-grained methods: The first focus on assigning a polarity (positive, negative, neutral) and optionally an intensity to a text snippet (Täckström and McDonald 2011; Pang and Lee 2004). The latter additionally aim at detecting the opinion holder (for instance a specific person mentioned in a news article) and the target (for instance a specific aspect of a product in a review) (Hu and Liu 2004; Popescu and Etzioni 2005; Jakob and Gurevych 2010).
Transferring such methods to the analysis of literature leads to at least two questions: Firstly, are polarities for this domain as helpful as for the analysis of reviews? Secondly, how can such methods from sentiment analysis be improved, and what can they contribute to literature analysis?
Regarding the first aspect, resources to measure the occurrence of words which are associated with different emotions have been developed for English but, to the best of our knowledge, not for German (Mohammad et al. 2015). Secondly, it should be noted that research in German sentiment analysis is still comparably limited (counter examples are Ruppenhofer et al. 2014; Klinger and Cimiano 2015; Remus, Quasthoff, and Heyer 2010). In addition, sentiment analysis has mainly focused on the Web, like social media, and product reviews. However, the analysis of emotions and sentiment in literature has been proven to be of interest and value (Mellmann 2007; Winko 2003). A prerequisite for a quantitative approach is that emotions are (at least to some extend) a surface phenomenon (Hillebrandt 2011, p. 154), i.e., that words carry information such that it is possible to infer “private states” of specific emotions (Wiebe, Wilson, and Cardie 2005).
Our two main contributions are: (a) We make German dictionaries of words associated with seven fundamental emotions publicly available, and (b) perform a case study on Kafka’s “Amerika” and “Das Schloss” regarding emotion analysis to support literature studies with a focus on complex non-factual phenomena and the analysis of personality traits. All resources and software used in this paper are made publicly available at

Materials and Methods
The goal of this work is to detect different emotions represented in literary texts. Psychological research offers different models to categorize emotions. The most common ones include Plutchik’s wheel of emotions (Plutchik 2001) and Ekman’s definition of fundamental emotions (Ekman 1999). A discussion of relevant context is offered by Russell (Russell 1991). We opt for roughly following the structure of Ekman’s definition of emotions and focus on
anger (Wut),
disgust (Ekel),
fear (Angst),
enjoyment (Glück),
sadness (Trauer), and
surprise (Überraschung) and
contempt (Verachtung).

To track emotions over the whole text, we assign an emotion score es(
ab) to a subset of consecutive tokens
ab from textual position
a to position
b as

e is a dictionary containing words expressing the specific emotion
e and 1

D is 1 if and only if t
e and 0 otherwise. This score corresponds to the number of tokens which are in a window and in the respective emotion dictionary, normalized by the dictionary size.

To track the development of the emotions over the whole text, we apply a sliding window approach which is parameterized by window size w such that
b = a + w − 1 (which can be interpreted as a smoothing parameter). To allow for a character oriented analysis, we assign an emotion score as in the sliding window, but for windows around each mention of such character in the text, with an additional normalization based on number of character mentions. Each token and dictionary entry is normalized by mapping to lower-case and stemming with the Porter stemmer (Porter 2001).

As a resource for the emotion dictionaries, two authors of this paper manually selected words from different sentiment polarity, subjectivity, and emotion resources in German and English (translated to German) into the emotion categories (Waltinger 2010a; Waltinger 2010b; Remus, Quasthoff, and Heyer 2010; Mohammad and Turney 2013). We semi-automatically enriched this resource with synonyms (Naber 2005; Wermke, Kunkel-Razum, and Scholze-Stubenrecht 2010).

Experiments and Conclusions
As an estimate for the difficulty of emotion assignments, we performed an annotation experiment of 300 words (stratified sample from all emotions in the dictionary mentioned above) with fluent speakers of German. In 85 % of all words two out of three annotators agree on the same emotion, however, only in 46 % of of all words, three annotators agree on the associated emotion.
As a use-case, we apply our methods to Franz Kafka’s “Der Verschollene” (“Amerika”) and “Das Schloß”. Especially the latter is interesting as a comprehensive emotion-focused manual analysis is available (Hillebrandt 2011). It is narrated in third person and interesting from an emotion analysis point of view, as attribution of specific emotions to the protagonist is difficult (Hillebrandt 2011, p. 165).
The development of emotions in Figures 1 and 2 visualize the outcomes of our analyzes. In “Das Schloss”, the strong increase of surprise towards the end is striking (most indicative words are “neu”, “schnell”, “plötzlich”, “ungeduldig”). Another example for an eye-catching peak of fear is shortly after start of chapter 3 (“ängstlich”, “Gefahr”, “unruhig”, “Gewalt”). In “Amerika”, one striking characteristic is the decrease of enjoyment after a peak in chapter 4 (“gut”, “Mutter”, “glücklich”) followed by disgust in chapter 5 (“unerträglich”, “Elend”, “schrecklich”, “beschmutzt”). Emotions for each mention of a selection of characters in “Amerika” and “Das Schloss” are shown in Figures 3 and 4.

This project has been partially funded by the project CRETA (Zentrum für reflektierte Textanalyse,, funded by the German Ministry of Education and Research.


Ekman, P. (1999). “Basic Emotions”. In:
Handbook of Cognition and Emotion. Ed. by M Dalgleish T; Power. Sussex, UK: John Wiley & Sons.

Hillebrandt, C. (2011).
Das emotionale Wirkungspotenzial von Erzähltexten. Deutsche Literatur - Studien und Quellen. Berlin, Germany: Akademie Verlag.

Hu, M. and Liu, B. (2004). “Mining and summarizing customer reviews”. In:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Seattle, WA, USA: ACM, pp. 168–77.

Jakob, N. and Gurevych, I. (2010). “Extracting opinion targets in a single- and cross-domain setting with conditional random fields”. In:
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, Massachusetts: Association for Computational Linguistics, pp. 1035–45.

Klinger, R. and Cimiano, P. (2015). “Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysis”. In:
Proceedings of the Nineteenth Conference on Computational Natural Language Learning. Beijing, China: Association for Computational Linguistics, pp. 153–63.

Liu, B. (2015).
Sentiment Analysis. Cambridge University Press.

Mellmann, K. (2007).
Emotionalisierung – Von der Nebenstundenpoesie zum Buch als Freund. Vol. 4. Poetogenesis - Studien zur empirischen Anthropologie der Literatur. Münster, Germany: Mentis Verlag.

Mohammad, S. M. and Turney, P. D. (2013). “Crowdsourcing a Word-Emotion Association Lexicon”. In: Computational Intelligence, 29(3): 436–65.

Mohammad, S. M., Zhu, X., Kiritchenko, S. and Martin, J. (2015). “Sentiment, emotion, purpose, and style in electoral tweets”. In:
Information Processing & Management, 51(4): 480–99.

Naber, D. (2005).
OpenThesaurus: ein offenes deutsches Wortnetz. publications/gldv-openthesaurus.pdf (visited on 02/17/2015).

Pang, B. and Lee, L. (2004). “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”. In:
Proceedings of the 42nd Meeting of the Association for Computational Linguistics, Main Volume. Barcelona, Spain, pp. 271–78.

Plutchik, R. (2001). “The Nature of Emotions”. In:
American Scientist, 89: 344–50.

Popescu, A.-M. and Etzioni, O. (2005). “Extracting Product Features and Opinions from Reviews”. In:
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. Vancouver, British Columbia, Canada: Associ- ation for Computational Linguistics, pp. 339–46.

Porter, M. F. (2001).
Snowball: A language for stemming algorithms. http://snowball.

Remus, R., Quasthoff, U. and Heyer, G. (2010). “SentiWS – a Publicly Available German- language Resource for Sentiment Analysis”. In:
Proceedings of the 7th International Language Resources and Evaluation (LREC’10), pp. 1168–71.

Ruppenhofer, J., Klinger, R., Struß, J. M., Sonntag, J. and Wiegand, M. (2014). “IGGSA Shared Tasks on German Sentiment Analysis”. In
: Workshop Proceedings of the 12th Edition of the KONVENS Conference. Ed. by G. Faaß and J. Ruppenhofer. Hildesheim, Germany: Uni- versity of Hildesheim.

Russell, J. A. (1991). “In Defense of a Prototype Approach to Emotion Concepts”. In:
Journal of Personality and Social Psychology, 60(1): 37–47.

Täckström, O. and McDonald, R. (2011). “Semi-supervised latent variable models for sentence- level sentiment analysis”. In:
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: Association for Computational Linguistics, pp. 569–74.

Waltinger, U. (2010a). “GERMANPOLARITYCLUES: A Lexical Resource for German Sentiment Analysis”. In:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC). Valletta, Malta.

Waltinger, U. (2010b). “Sentiment Analysis Reloaded: A Comparative Study On Sentiment Polarity Identi- fication Combining Machine Learning And Subjectivity Features”. In:
Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST ’10). Valencia, Spain.

Wermke, M., Kunkel-Razum, K. and Scholze-Stubenrecht, W., (eds). (2010).
Duden – Das Synonymwörterbuch. Mannheim, Zürich: Dudenverlag.

Wiebe, J., Wilson, T. and Cardie, C. (2005). “Annotating Expressions of Opinions and Emotions in Language”. In:
Language Resources and Evaluation, 39(2): 165–210.

Winko, S. (2003).
Kodierte Gefühle: Zu einer Poetik der Emotionen in lyrischen und poetologischen Texten um 1900. Erich Schmidt Verlag.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website:

Series: ADHO (11)

Organizers: ADHO