Linked Data Approach for Studying Parliamentary Speeches and Networks of Politicians in Finland 1907-2021

paper, specified "long paper"
  1. 1. Eero Hyvönen

    Aalto University, Finland; University of Helsinki (HELDIG), Finland

  2. 2. Petri Leskinen

    Aalto University

  3. 3. Laura Sinikallio

    University of Helsinki (HELDIG), Finland; Aalto University, Finland

  4. 4. Senka Drobac

    Aalto University, Finland; University of Helsinki (HELDIG), Finland

  5. 5. Jouni Tuominen

    Aalto University, Finland; University of Helsinki (HELDIG), Finland

  6. 6. Kimmo Elo

    University of Turku

  7. 7. Matti La Mela

    University of Helsinki (HELDIG), Finland

  8. 8. Mikko Koho

    Aalto University

  9. 9. Esko Ikkala

    Aalto University

  10. 10. Minna Tamper

    Aalto University

  11. 11. Rafael Leal

    Aalto University, Finland; University of Helsinki (HELDIG), Finland

  12. 12. Joonas Kesäniemi

    Aalto University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Vision for a Paradigm Change

This paper presents the vision of publishing and using parliamentary data on the Semantic Web for Digital Humanities (DH) research, with a focus on studying parliamentary culture, language used, and networks of politicians (Hyvönen et al., 2021, 2022). First results of the Semantic Parliament project (ParliamentSampo, 2021) are presented:

Finnish parliamentary debates, totalling over 950 000 speeches and covering the whole history of the Parliament of Finland (PofF) 1907–2021, have been transformed into a 1) Speech Knowledge Graph (S-KG) and 2) into XML form using the new emerging Parla-CLARIN format (Parla-CLARIN, 2021) for international interoperability (Sinikallio et al., 2021).
A Prosopographical Knowledge Graph (P-KG) representing biographical data about the politicians during the same time span, using the event-based CIDOC CRM (CIDOC CRM, 2021) ontology, has been created and interlinked with the S-KG (Leskinen et al., 2021).
The datasets S-KG and P-KG were published as a Linked Open Data (LOD) service with a SPARQL endpoint and are used to study Finnish political culture and language, based on the speeches and networks of politicians (Hyvönen et al., 2021, 2022).
To demonstrate the usability of the new data infrastructure, a semantic portal
ParliamentSampo – Parliament of Finland on the Semantic Web, targeted for researchers and the public is presented. The portal is based on the LOD service (3). The portal and LOD service will be opened for public use by the end of 2022.

Related Work

Parliamentary data are widely available for making political decision making transparent, and the data is used for linguistic and DH research. The paper first explains why publishing parliamentary data as LD makes sense, and discusses related projects in different countries (ParlaMint, 2021), including Canada, Italy, Latvia, Slovenia, UK, and the LinkedEP system (Van Aggelen et al., 2017) of the European Parliament. After this, the knowledge graphs (KG) of the ParliamentSampo system and their creation processes are presented and the benefits and challenges of the LOD approach are discussed, suggesting a paradigm shift in publishing and studying parliamentary data using semantic web technologies.
The Model and Implementation
Based on the Sampo-model (Hyvönen, 2021) and Sampo-UI framework (Ikkala et al., 2021), ParliamentSampo aggregates and enriches data from multiple data providers in addition to the PofF, and publishes the result in a LOD service, based on best practices of W3C (Heath & Bizer, 2011), including a SPARQL endpoint and additional LOD services, such as content negotiation. In addition, the 7-star LOD model (Hyvönen et al., 2014) extending the traditional 5-star model of Tim Berners-Lee with schema documentation and data validation is used. The LOD service can be used for direct DH analyses using its APIs and for creating ready-to-use applications for research. Data and application dissemination is supported using Docker containers.

Results and Evaluation

Feasibility of the ParliamentSampo approach is evaluated by showing how the SPARQL endpoint together with tools, such as YASGUI (Rietveld & Hoekstra, 2017) and Google Colab with Jupyter Notebooks, can be used for novel DH analyses and visualizations on parliamentary speeches and networks of politicians. This is the first time that all speeches of the PofF since it was established in 1907 are available as uniform data for DH research. We also introduce the new semantic portal “ParliamentSampo – Parliament of Finland on the Semantic Web” implemented on top of the LOD service. It is demonstrated how the portal can be used for analyzing political language in use in different times, their semantic content, and differences between prosopographical groups, such as female and male Members of the Parliament and different political parties. For this purpose, the speech texts have been enriched semantically with Named Entity Linking using FinBERT (2021), a Finnish language model based on Google BERT, by ontology-based keyword indexing using the automatic annotation tool Annif (Suominen, 2019), and by topic detection. Furthermore, network analyses of political networks using P-KG are presented using the
Sparql2GraphServer tool (Leskinen et al., 2021). When using the portal, programming skills are not needed but data literacy. Finally, new possibilities and challenges of using linked data and ParliamentSampo in parliamentary studies are discussed and directions for further research are suggested.

The multidisciplinary work on ParliamentSampo has involved researchers in computer science, parliamentary studies, and linguistics at the University of Helsinki (HELDIG centre), Aalto University, and University of Turku, and is funded mostly by the Academy of Finland and EU project In/Tangible European Heritage. CSC – IT Center for Science, Finland, provided computational resources.

Van Aggelen, A., Hollink, L., Kemman, M., Kleppe, M., & Beunders, H. (2017). The Debates of the European Parliament as Linked Open Data. Semantic Web, 8(2), 271–281.
Parla-CLARIN. (Nov 24, 2021). Parla-CLARIN format:

CIDOC CRM. (Nov 24, 2021). CIDOC-CRM standard:

FinBERT (Nov 24, 2021). Finnish BERT model:

Heath, T., & Bizer, C. (2011). Linked Data: Evolving the Web into a Global Data Space.
Morgan & Claypool, Palo Alto, California.
Hyvönen, E., Tuominen, J., Alonen, M., & Mäkelä, E. (2014). Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets. In V. Presutti, E. Blomqvist, R. Troncy, H. Sack, I. Papadakis, & A. Tordai (Eds.), The Semantic Web: ESWC 2014 Satellite Events. ESWC 2014, pp. 226–230, Springer.

Hyvönen, E. (2021). Digital Humanities on the Semantic Web: Sampo Model and Portal Series. Submitted.

Leskinen, P., Hyvönen, E., & Tuominen, J. (2021). Members of Parliament in Finland Knowledge Graph and its Linked Open Data Service. April.
Proceedings of SEMANTiCS – In the Era of Knowledge Graphs, Amsterdam, Sept 6–9, 2021.

Hyvönen, E., Sinikallio, L:, Leskinen, P., Drobac, S., Tuominen, J., Elo, K., La Mela, M., Koho, M., Ikkala, E., Tamper, M., Leal, R. & Kesäniemi, J. (2021). Semanttinen parlamentti: eduskunnan aineistojen linkitetyn avoimen datan palvelu ja sen käyttömahdollisuudet. Informaatiotutkimus, vol. 40, no. 2.
Hyvönen, E., Sinikallio, L:, Leskinen, P., Drobac, S., Tuominen, J., Elo, K., La Mela, M., Koho, M., Ikkala, E., Tamper, M., Leal, R. & Kesäniemi, J. (2022). Digital Parliamentary data in Action (DiPaDa 2022), Workshop at the 6th Digital Humanities in Nordic and Baltic Countries Conference, long paper, CEUR Workshop Proceedings, 2022. Forth-coming.

Ikkala, E., Hyvönen, E., Rantala, H., & Koho, M. (2022). Sampo-UI: A Full Stack JavaScript Framework for Developing Semantic Portal User Interfaces. Semantic Web – Interoperability, Usability, Applicability, 13(1), 69–84.

Leskinen, P., Hyvönen, E. & Tuominen, J. 2021.
Sparql2GraphServer: a Server-side Tool for Extracting Networks from Linked Data for Data Analysis. ISWC-Posters-Demos-Industry 2021 International Semantic Web Conference (ISWC) 2021: Posters, Demos, and Industry Tracks, CEUR Workshop Proceedings, Vol 2980.

ParlaMint. (Nov 24, 2021). ParlaMint initiative homepage:

Rietveld, L., & Hoekstra, R. (2017). The YASGUI family of SPARQL clients.
Semantic Web – Interoperability, Usability, Applicability, 8(3), 373–383.

Sinikallio, L., Drobac, S., Tamper, M., Leal, R., Koho, M., Tuominen, J., La Mela, M., & Hyvönen, E. (2021). Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup. In:
3rd Conference on Language, Data and Knowledge (LDK 2021), 1–17
. OASICS, Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, Germany.

ParliamentSampo. (Nov 24, 2021). Semantic Parliament project homepage:

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. Liber Quarterly, July.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO