Sentiment lexicons or BERT? A comparison of sentiment analysis approaches and their performance.

paper, specified "short paper"
  1. 1. Giulia Grisot

    University of Bielefeld

  2. 2. Simone Rebora

    University of Bielefeld

  3. 3. Berenike Herrmann

    University of Bielefeld

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


With the development of new powerful technologies for computational data analysis, the opportunities for – and interest in – the detection of sentiments and opinion in texts have grown considerably (Liu 2015). Because of the vast amount of material available online, these investigations have focused mostly on textual material gathered from social media, making use of traditional corpus linguistic approaches as well as deep learning tools.
Sophisticated sentiment analysis (SA) of literary texts, especially in languages other than English, is still in its infancy (Kim and Klinger 2019). This has depended partly on the limited amount of digital texts available, partly the complex structure of literary texts, and finally on methodological challenges, with skills needed that seldom form part of the training of literary scholars.
Emotions are however central to the experience of literary narrative (Oatley 2012; Hogan 2016), and recent advances in their systematic, quantitative analysis have been made within computational literary studies (Jockers 2017, Burghardt et al 2019). Yet, such investigations have mostly relied on existing lists of words associated with sentiment and emotion values, the so-called 
sentiment lexicons. While these remain conventional and useful tools, they can only provide limited insights to the representation of emotions in texts.

Using a lexicon-based method, we have previously investigated emotions and sentiments in relation to the representation of landscape in Swiss literature, looking in particular at the differences between the 
rural and 
urban spaces portrayed in a corpus of Swiss novels written in German (see Grisot, G., Herrmann, J.B. (in preparation).

The present paper takes a step forward, using manual annotation and advanced machine learning methods to train a fine-tuned model to recognise 
valence and 
arousal on a historical corpus. Our goals are higher levels of lexical coverage and validity when compared to our prior results obtained with sentiment lexicons.

We describe here the current state of our method to detect sentiment using deep learning approaches. Using a language model trained on a large corpus (3000+) of German literary texts spanning from 1800 to 1950 (Fischer and Strötgen 2017), we make use of BERT word embeddings and manually annotated sentences to recognise sentiment.
500+ sentences were taken from three Swiss-German novels - one of which children's fiction - and annotated for 
valence - understood here as the degree of 'positivity' of the detected sentiment - and 
arousal - its 'intensity' or 'degree of activation' - by two trained student assistants using the same instructions. Currently, intra-class correlation coefficient (ICC) between manual annotators calculated on these sentences is 0.721 for valence and 0.606 for arousal.

Scores for individual texts indicated preliminary evidence of a genre effect, with higher ICCs for the children's novel (valence 0.86; arousal 0.78 for 149 sentences) as compared to the other two, more complex, realist novels (valence 0.78, 0.62 arousal for 182 sentences; 0.51 for valence, 0.41 arousal for 198 sentences).
The annotated sample was used to train a deep learning classifier, using linear regression to finetune the Literary German BERT model (, which reached Pearson's r scores of 0.53 for valence and 0.63 for arousal. These scores are very promising, suggesting the possibility - provided more training data - of a full automation of the annotation task on our domain of historical literary texts.

We are currently appending the annotation and at the time of the conference shall be able to update the results on a broader data base.
While potentially taking automatic SA of German literary texts to a new level, our study also allows evaluating the performance of lexicon-based in direct comparison with deep learning SA approaches, thus allowing to gauge the validity of different SA methods on a data-driven basis. This approach also raises questions concerning the effect of genre on the ease and validity of manual sentiment annotations.


Burghardt, M., Wolff, C., & Schmidt, T. (2019, January 1). Toward multimodal sentiment analysis of historic plays: A case study with text and audio for Lessing’s Emilia Galotti. 4th Conference of the Association Digital Humanities in the Nordic Countries.

Fischer, F., & Strötgen, J. (2017). Corpus of German-Language Fiction (txt).

Grisot, G. and Herrmann, B. J. (in preparation) Examining the representation of landscape and its emotional value in German-Swiss fiction around 1900

Hogan, P. C. (2016). Affect Studies. Oxford University Press.

Jockers, M. (2017). Extracts sentiment and sentiment-derived plot arcs from text. R Package “Syuzhet.

Kim, E., & Klinger, R. (2019). A survey on sentiment and emotion analysis for computational literary studies. Zeitschrift Für Digitale Geisteswissenschaften.

Liu, B. (2015). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. New York, NY: Cambridge University Press.

Oatley, K. (2012). The Passionate Muse: Exploring Emotion in Stories. New York: Oxford University Press.

R Core Team. (2021). R: A language and environment for statistical computing.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO