The model of choice.Using pure CRF- and BERT-based classifiers for gender annotation in German fantasy fiction

paper, specified "long paper"
  1. 1. Mareike Katharina Schumacher

    Technische Universität Darmstadt (Technical University of Darmstadt)

  2. 2. Marie Flüh

    Universität Hamburg (University of Hamburg)

  3. 3. Marc Lemke

    Universität Rostock (University of Rostock)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Training a gender classifier for German literature

In several (Digital) Humanities studies, it has been shown that character analysis with a scope on gender can give interesting insides into literary history (cf. Underwood 2019: 111-142, Piper 2018: 133-138). With BookNLP (Bamann 2021) there is a well-performing tool including referential gender inference in the domain of English literary fiction (cf. Underwood 2019: 114). Here, we present a classification tool that is optimized for German fiction and which does not focus on pronouns used for fictional characters but on the ascribed gendered roles (which is referred to as gender identity by Butler 2003). As a starting point, we trained the classifier to annotate the binary (and often stereotyped) gender categories “feminine”, “masculine”, and “neutral”. It is planned to include more categories for gender roles in the future. To reach high accuracy on different literary genres it was trained in an iterative domain adaptation process, which can be roughly split up into three phases (cf. phases 1–3 in table 1). 

Table 1: Domain adaptation phases, datasets and performance values (set 1–3: CRF; set 4–6: gBERT)

387.000 tokens, as well as an annotated list consisting of about 7.000 names taken from dramatis personae archived in the GerDraCor-repository (cf. Fischer et al. 2019), have been included in the training corpus. Training data has been manually annotated from scratch, meaning that names and gendered roles have been tagged as either feminine, masculine or neutral. Adding genre-specific training data first leads to an optimization of classification on the specific genre the training data is taken from and second to higher accuracy in the other genres (cf. fig. 1). In the end, our classifier trained with pure CRF algorithms reached 0.86 on novellas, 0.73 on novels and 0.76 on plays. On average the classifier reaches an overall F1 score of 0.78 (Schumacher 2021). The information on gendered roles mentioned in fiction can be combined with other aspects of the analysis of fictional characters such as described features (Schumacher/Flüh 2020), emotions (Schumacher/Flüh 2020, Flüh/Schumacher 2022, Flüh/Horstmann/Schumacher forthcoming) and power structures (Schumacher/Flüh forthcoming). 

Figure 1: Training of a generic gender classifier
To adapt recognition and classification to youth fantasy fiction, we tried the implementation of neural networks following a transfer learning approach (cf. phases 4–6 in table 1).

Creating neural net-based Gender Classifiers

We used the software
Neiss TEI Entity Enricher (NTEE),
an implementation of a Transfer Learning (Kamath et al., 2019) approach, to create a neural net-based classifier. Large-scale language models, which are built according to a Bidirectional Encoder Representations from Transformers (BERT) architecture (Devlin et al., 2019) can be fine-tuned using ground truth data for particular NER tasks. In this process, a CRF layer is added to the models (cf. Zöllner et al, 2021: 9–10). For our investigation, we use
(cf. Chan, Schweter, Möller 2021), which is pre-trained with data from the 20th and 21st centuries. 

Using sets 4, 5 and 6 (cf. table 1) as ground truth datasets, we compared the performances of classifiers, trained on either generic data, genre-specific data or a combined dataset. Comparing the performance values of the differently designed models shows two things: 

Using genre-specific data only for fine-tuning the pre-trained gBERT model, in this case, is more efficient than using generic data.

Using combined data for fine-tuning the pre-trained gBERT model, in this case, works best.

One can also see a slight difference between the training of the pure CRF-classifier and the fine-tuning of the BERT model (cf. fig. 2). For this implementation, the combination of genre-specific and generic data clearly works best (F1-score of 0.92 tested on fantasy novels). 

Figure 2: Average performances of  the CRF-based and the gBERT-based classifiers

Beauvoir, Simone de (1992):
Das andere Geschlecht. Reinbek.

Bourdieu, Pierre (2010):
Die männliche Herrschaft. 1. Aufl., [Nachdr.]. Frankfurt am Main.

Butler, Judith (2003):
Das Unbehagen der Geschlechter. 1. Aufl. [Nachdr.]. Frankfurt am Main.

Chan, Branden, Stefan Schweter, Timo Möller (2020):
German’s Next Language Model.
[Access 8th december 2021].

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2019). “Bert: Pre-training of deep bidirectional transformers for language understanding.” In:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Com-putational Linguistics 1, p. 4171–4186.

Fischer, Frank, Ingo Börner, Mathias Göbel, Angelika Hechtl, Christopher Kittel, Carsten Milling und Peer Trilcke (2019): Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama. In: Digital Humanities 2019. Utrecht, Zenodo


Flüh, Marie, Jan Horstmann, Mareike Schumacher (forthcoming): Distant Gender Reading
Genderaspekte in Fantasy-Jugendromanen von 2008 bis 2020. 

Flüh, Marie, & Schumacher, Mareike. (2022, March 7). Jung, wild, emotional? Rollen und Emotionen Jugendlicher in zeitgenössischer Fantasy-Literatur. DHd 2022 Kulturen des digitalen Gedächtnisses. 8. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd 2022), Potsdam.

Foucault, Michel, Herculine Barbin (2012):
Über Hermaphrodismus: Der Fall Barbin. Frankfurt am Main.

Kamath, Uday, John Liu, and James Whitaker (2019):
Deep Learning for NLP and Speech Recognition. Cham: Springer.

Manning, Christopher, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky (2014): „The Stanford CoreNLP Natural Language Processing Toolkit“. In:
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, p. 55–60,
[Access 8th december 2021].

Connell, Raewyn (1996):
Gender and power society, the person and sexual politics, Reprint Aufl.

Connell, Raewyn (2015):
Der gemachte Mann Konstruktion und Krise von Männlichkeiten. Geschlecht & Gesellschaft Band 8, 4. durchgesehene und erweiterte Auflage Aufl. Wiesbaden.

Piper, Andrew (2018): Enumerations. Chicago: The University of Chicago Press.
Schumacher, Mareike, Marie Flüh (2020): “Figurengender zwischen Stereotypisierung und literarischen und theoretischen Spielräumen: Genderstereotypen und -bewertungen in der Literatur des 19. Jahrhundert”. In: Christof Schöch (Hg.):
DHd2020: Digital Humanities zwischen Modellierung und Interpretation. Konferenzabstracts. o.O. 2020, S. 162–167, [Access: 26. November 2021].

Schumacher, Mareike (2021):
StanfordNER Gender-Classifier. DOI: 10.5281/zenodo.5555952.

Schumacher, Mareike, Marie Flüh (forthcoming): “Macht vs. Emotion. Handlungstreibende Muster in Günderrodes Dramen digital, distant und scalable gelesen”. In: Roland Borgers, Friederike Middelhoff, Martina Wernli (Ed.): Neue Romantikforschung, Stuttgart: Metzler.
Schumacher, Mareike, Marie Flüh (2021): “Digitale diachrone Korpusanalyse am Beispiel des Projekts „m*w – Gender Stereotype in der Literatur“. In:
Digital humanities and gender history. Jena. DOI:

Underwood, Ted (2019): Distant Horizons. Chicago: The University of Chicago Press.
Weitin, Thomas (2016):
Volldigitalisiertes XML-Korpus. Der Deutsche Novellenschatz. Hg. von Paul Heyse, Hermann Kurz. 24 Bde. 1871–
1876. Darmstadt/Konstanz. URL:
[Access: 26. November 2021].

Zöllner, Jochen, Konrad Sperfeld, Christoph Wick, and Roger Labahn (2021):
Optimizing small berts trained for german NER. arXiv. URL:
[Access: 8th December 2021].

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO