Aalto University, Finland; University of Helsinki (HELDIG), Finland
Aalto University
University of Helsinki (HELDIG), Finland; Aalto University, Finland
Aalto University, Finland; University of Helsinki (HELDIG), Finland
Aalto University, Finland; University of Helsinki (HELDIG), Finland
University of Turku
University of Helsinki (HELDIG), Finland
Aalto University
Aalto University
Aalto University
Aalto University, Finland; University of Helsinki (HELDIG), Finland
Aalto University
Vision for a Paradigm Change
This paper presents the vision of publishing and using parliamentary data on the Semantic Web for Digital Humanities (DH) research, with a focus on studying parliamentary culture, language used, and networks of politicians (Hyvönen et al., 2021, 2022). First results of the Semantic Parliament project (ParliamentSampo, 2021) are presented:
Finnish parliamentary debates, totalling over 950 000 speeches and covering the whole history of the Parliament of Finland (PofF) 1907–2021, have been transformed into a 1) Speech Knowledge Graph (S-KG) and 2) into XML form using the new emerging Parla-CLARIN format (Parla-CLARIN, 2021) for international interoperability (Sinikallio et al., 2021).
A Prosopographical Knowledge Graph (P-KG) representing biographical data about the politicians during the same time span, using the event-based CIDOC CRM (CIDOC CRM, 2021) ontology, has been created and interlinked with the S-KG (Leskinen et al., 2021).
The datasets S-KG and P-KG were published as a Linked Open Data (LOD) service with a SPARQL endpoint and are used to study Finnish political culture and language, based on the speeches and networks of politicians (Hyvönen et al., 2021, 2022).
To demonstrate the usability of the new data infrastructure, a semantic portal
ParliamentSampo – Parliament of Finland on the Semantic Web, targeted for researchers and the public is presented. The portal is based on the LOD service (3). The portal and LOD service will be opened for public use by the end of 2022.
Related Work
Parliamentary data are widely available for making political decision making transparent, and the data is used for linguistic and DH research. The paper first explains why publishing parliamentary data as LD makes sense, and discusses related projects in different countries (ParlaMint, 2021), including Canada, Italy, Latvia, Slovenia, UK, and the LinkedEP system (Van Aggelen et al., 2017) of the European Parliament. After this, the knowledge graphs (KG) of the ParliamentSampo system and their creation processes are presented and the benefits and challenges of the LOD approach are discussed, suggesting a paradigm shift in publishing and studying parliamentary data using semantic web technologies.
The Model and Implementation
Based on the Sampo-model (Hyvönen, 2021) and Sampo-UI framework (Ikkala et al., 2021), ParliamentSampo aggregates and enriches data from multiple data providers in addition to the PofF, and publishes the result in a LOD service, based on best practices of W3C (Heath & Bizer, 2011), including a SPARQL endpoint and additional LOD services, such as content negotiation. In addition, the 7-star LOD model (Hyvönen et al., 2014) extending the traditional 5-star model of Tim Berners-Lee with schema documentation and data validation is used. The LOD service can be used for direct DH analyses using its APIs and for creating ready-to-use applications for research. Data and application dissemination is supported using Docker containers.
Results and Evaluation
Feasibility of the ParliamentSampo approach is evaluated by showing how the SPARQL endpoint together with tools, such as YASGUI (Rietveld & Hoekstra, 2017) and Google Colab with Jupyter Notebooks, can be used for novel DH analyses and visualizations on parliamentary speeches and networks of politicians. This is the first time that all speeches of the PofF since it was established in 1907 are available as uniform data for DH research. We also introduce the new semantic portal “ParliamentSampo – Parliament of Finland on the Semantic Web” implemented on top of the LOD service. It is demonstrated how the portal can be used for analyzing political language in use in different times, their semantic content, and differences between prosopographical groups, such as female and male Members of the Parliament and different political parties. For this purpose, the speech texts have been enriched semantically with Named Entity Linking using FinBERT (2021), a Finnish language model based on Google BERT, by ontology-based keyword indexing using the automatic annotation tool Annif (Suominen, 2019), and by topic detection. Furthermore, network analyses of political networks using P-KG are presented using the
Sparql2GraphServer tool (Leskinen et al., 2021). When using the portal, programming skills are not needed but data literacy. Finally, new possibilities and challenges of using linked data and ParliamentSampo in parliamentary studies are discussed and directions for further research are suggested.
The multidisciplinary work on ParliamentSampo has involved researchers in computer science, parliamentary studies, and linguistics at the University of Helsinki (HELDIG centre), Aalto University, and University of Turku, and is funded mostly by the Academy of Finland and EU project In/Tangible European Heritage. CSC – IT Center for Science, Finland, provided computational resources.
Bibliography
Van Aggelen, A., Hollink, L., Kemman, M., Kleppe, M., & Beunders, H. (2017). The Debates of the European Parliament as Linked Open Data. Semantic Web, 8(2), 271–281.
Parla-CLARIN. (Nov 24, 2021). Parla-CLARIN format:
https://clarin-eric.github.io/parla-clarin/
CIDOC CRM. (Nov 24, 2021). CIDOC-CRM standard:
https://cidoc-crm.org
FinBERT (Nov 24, 2021). Finnish BERT model:
https://github.com/TurkuNLP/FinBERT
Heath, T., & Bizer, C. (2011). Linked Data: Evolving the Web into a Global Data Space.
Morgan & Claypool, Palo Alto, California.
Hyvönen, E., Tuominen, J., Alonen, M., & Mäkelä, E. (2014). Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets. In V. Presutti, E. Blomqvist, R. Troncy, H. Sack, I. Papadakis, & A. Tordai (Eds.), The Semantic Web: ESWC 2014 Satellite Events. ESWC 2014, pp. 226–230, Springer.
https://link.springer.com/chapter/10.1007%2F978-3-319-11955-7_24
Hyvönen, E. (2021). Digital Humanities on the Semantic Web: Sampo Model and Portal Series. Submitted.
https://seco.cs.aalto.fi/publications/2021/hyvonen-sampo-model-2021.pdf
Leskinen, P., Hyvönen, E., & Tuominen, J. (2021). Members of Parliament in Finland Knowledge Graph and its Linked Open Data Service. April.
Proceedings of SEMANTiCS – In the Era of Knowledge Graphs, Amsterdam, Sept 6–9, 2021.
https://seco.cs.aalto.fi/publications/2021/leskinen-et-al-mps-2021.pdf
Hyvönen, E., Sinikallio, L:, Leskinen, P., Drobac, S., Tuominen, J., Elo, K., La Mela, M., Koho, M., Ikkala, E., Tamper, M., Leal, R. & Kesäniemi, J. (2021). Semanttinen parlamentti: eduskunnan aineistojen linkitetyn avoimen datan palvelu ja sen käyttömahdollisuudet. Informaatiotutkimus, vol. 40, no. 2.
Hyvönen, E., Sinikallio, L:, Leskinen, P., Drobac, S., Tuominen, J., Elo, K., La Mela, M., Koho, M., Ikkala, E., Tamper, M., Leal, R. & Kesäniemi, J. (2022). Digital Parliamentary data in Action (DiPaDa 2022), Workshop at the 6th Digital Humanities in Nordic and Baltic Countries Conference, long paper, CEUR Workshop Proceedings, 2022. Forth-coming.
https://seco.cs.aalto.fi/publications/2022/hyvonen-et-al-semparl-dhnb-2022.pdf
Ikkala, E., Hyvönen, E., Rantala, H., & Koho, M. (2022). Sampo-UI: A Full Stack JavaScript Framework for Developing Semantic Portal User Interfaces. Semantic Web – Interoperability, Usability, Applicability, 13(1), 69–84.
https://doi.org/10.3233/SW-210428
Leskinen, P., Hyvönen, E. & Tuominen, J. 2021.
Sparql2GraphServer: a Server-side Tool for Extracting Networks from Linked Data for Data Analysis. ISWC-Posters-Demos-Industry 2021 International Semantic Web Conference (ISWC) 2021: Posters, Demos, and Industry Tracks, CEUR Workshop Proceedings, Vol 2980.
http://ceur-ws.org/Vol-2980/paper330.pdf
ParlaMint. (Nov 24, 2021). ParlaMint initiative homepage:
https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora
Rietveld, L., & Hoekstra, R. (2017). The YASGUI family of SPARQL clients.
Semantic Web – Interoperability, Usability, Applicability, 8(3), 373–383.
https://doi.org/10.3233/SW-150197
Sinikallio, L., Drobac, S., Tamper, M., Leal, R., Koho, M., Tuominen, J., La Mela, M., & Hyvönen, E. (2021). Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup. In:
3rd Conference on Language, Data and Knowledge (LDK 2021), 1–17
. OASICS, Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, Germany.
https://drops.dagstuhl.de/opus/volltexte/2021/14544/pdf/OASIcs-LDK-2021-8.pdf
ParliamentSampo. (Nov 24, 2021). Semantic Parliament project homepage:
https://seco.cs.aalto.fi/projects/semparl/en/
Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. Liber Quarterly, July.
https://liberquarterly.eu/article/view/10732/11612
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Tokyo, Japan
July 25, 2022 - July 29, 2022
361 works by 945 authors indexed
Held in Tokyo and remote (hybrid) on account of COVID-19
Conference website: https://dh2022.adho.org/
Contributors: Scott B. Weingart, James Cummings
Series: ADHO (16)
Organizers: ADHO