Mining for clean energy: a machine learning approach to historicized sentiment mining of fossil fuel discourse in the Netherlands

paper, specified "long paper"
  1. 1. Pim Huijnen

    Utrecht University

  2. 2. Gertjan Plets

    Utrecht University

  3. 3. Jaap Verheul

    Utrecht University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Global sustainability is one of the most urgent issues of today. Although the climate crisis has been on and off the Dutch political agenda for at least fifty years, a longer-term historical perspective on the crisis plays a minor role in present-day discussions. Nevertheless, such a perspective can give us important insights into the forces at play in the complex environmental, social, and cultural issues that are at stake when it comes to sustainability. We cannot explain how and why public sentiments have changed over the last decades without mapping where they are rooted in and how they have evolved over time. We aim to investigate this for the Dutch case by tracing shifting sentiments towards fossil fuels in the postwar newspaper discourse. Interesting, for example, is that natural gas was framed as a sustainable, environment-friendly alternative for coal and petrol when it was introduced in the 1960s, while it presently carries the same deprecative label of 'fossil fuel' as the others. We have built a sentiment mining pipeline to be able to better understand semantic and emotional shifts like these.
Mass digitization has made a long-term historical semantic perspective on public meaning, emotions, and sentiments, as this project envisions, both innovative and feasible. The training of dedicated models based on the vectorization of language, at the same time, has enabled studying semantics on an entirely different scale than manually possible. The state-of-the-art in language models are transformer models, which, unlike previous vectorization techniques like word embeddings, consider the context in which words are used thanks to the introduction of “self-attention” layers. Therefore, they result in a more precise modeling of features of language than previous models, as Google’s BERT has demonstrated (Devlin 2019).
This project deals with the analysis of sentiment within newspaper articles from 1960 to 1995 contained in the massive digitized newspaper archive of The National Library of the Netherlands (KB)

. By creating multiple fine-tuned BERT models, adapted to topics and decades, this project has produced models that run historically dynamic sentiment analyses that are context-specific and easily repeatable on different topics. The output of the models creates a sentiment variable which is topic- and context-specific.

Preprocessing and selection
The original OCR newspaper texts received from KB have been reformatted and divided by decade. Only texts labeled as articles (not advertisements) category were considered, which resulted in a final pre-training dataset of 43.4GB of uncompressed text. On this dataset, we have used a labeling and predicting pipeline to extract documents on fossil fuels, and a second to predict sentiments.
Before labeling, the data was first cleaned and tokenized. We used the SentencePiece (Kudo and Richardson, 2018) library to create a tokenizer file. Then we followed the lead of the Swedish (Malmsten et al., 2020), Finnish (Virtanen et al., 2019) and German (Branden et al., 2019) BERT project to select dictionary size and the configuration of the BERT model. Labeling the sentiment of articles is a complex task, as an article is composed of many sentences that might have contrasting sentiment when taken individually and/or out-of-context. The data was, further, decomposed in its main paragraphs as divided by KB’s OCR to predict topicality and sentiment on a more fine-grained level.

We split paragraphs longer than 400 words and discarded very short (one-sentence) paragraphs. All preprocessing and modeling scripts are available on

Labeling for topicality was done per type of fossil fuel (coal, natural gas, petrol) and per decade by two labelers (which the project team evaluated), resulting in a final dataset of 1.5GB and 568,160 paragraphs of text with a 0.95 confidence score (see table 1). This data was used as input for the later models.

Table 1: Dataset after prediction of topicality: newspaper articles on goal, natural gas, petrol for the 1960s-1990s.

Type of fossil fuel
Size (MB)
No. paragraphs


Natural Gas




Natural Gas




Natural Gas




Natural Gas



Sentiment Labeling and fine-tuning
We selected two labelers to label the sentiment on each paragraph. This was done to improve the generalizability of the models and to avoid that the models would learn the subjective interpretation of one labeler on the articles’ sentiments. The labelers used a range of three classes: -1 (negative), 0 (neutral), and +1 (positive). After labeling, we calculated the interrater reliability score using Cohen’s Kappa score (Cohen, 1960) as a measure of the agreement among raters with the objective to compute the extent to which the data collected in the study are correct representations of the variables measured. The computed scores highlight a low agreement on 1960s and 1970s (average 0.22 and 0.24); a higher agreement on 1980s (0.36) and the highest agreement on 1990s (0.58). We weighted the labelers’ judgements of the same paragraphs in the following way: if they disagreed between 0 and -1/+1, we used the more ‘extreme’ label; paragraphs with opposite labels were discarded altogether.
Instead of the pre-training we opted to fine-tune BERT models, using an already pre-trained BERT model. The model selected to be fine-tuned with the sentiment labels and texts classified by our labelers was BERTje (de Vries et al., 2019). The fine-tuned BERT models take the labels for each type of fossil fuel (natural gas, coal, petroleum) within one decade (1960s – 1990s) to predict sentiment scores for the entire datasets.

Based on the predictions, we have been able to visualize average sentiments (-1 to +1) per year for the entire newspaper discourse on the three types of fossil fuel between 1960 and 1995 (Figure 1). Most striking is the fact coal rather that natural gas is regarded more positively after the 1973 oil crisis (although the Dutch government had in 1965 already decided to stop all coal production by 1975). These trends will form the basis for an in-depth analysis in our final paper of the fossil fuel discourse in the Netherlands that will be based on significant changes in notable words (tf-idf) over time and between the three types of energy. We will, particularly, focus on shifts in discourses related to nuclear energy and renewable energy. Idf scores indicate that the former becomes increasingly noticeable in articles on natural gas throughout the decades, while the notion of ‘clean’ (‘schone’) becomes less prominent (Figure 2).
Figure 1: Average sentiment score of articles on coal, natural gas, and petrol in Dutch newspapers between negative (-1) and positive (+1) per year, 1960-1995

Figure 2: Idf scores (per decade) for ‘nuclear energy’ (‘atoomenergie’) and ‘clean’ (‘schone’) in documents on natural gas, 1960s-1990s.


Branden C. et al. (2019). German’s Next Language Model.
arXiv preprint arXiv:2010.10906.

Cohen, J.
(1960). A coefficient of agreement for nominal scales.
Educational and psychological measurement
(1), 37-46.

Delobelle, P., Winters, T., & Berendt, B.
(2020). Robbert: a dutch roberta-based language model.
arXiv preprint arXiv:2001.06286

De Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., & Nissim, M.
(2019). Bertje: A dutch bert model.
arXiv preprint arXiv:1912.09582

Kudo T. and Richardson J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing.
arXiv preprint arXiv:1808.06226.

Malmsten M. et al. (2020). Playing with Words at the National Library of Sweden – Making a Swedish BERT.
arXiv preprint arXiv:2007.01658.

Virtanen A. et al. (2019). Multilingual is not enough: BERT for Finnish.
arXiv preprint arXiv:1912.07076.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO