Polish Literary Bibliography - New Research Data Portal for Complex Cultural Dataset

poster / demo / art installation
  1. 1. Beata Koper

    Institute of Literary Research - Polish Academy of Sciences

  2. 2. Tomasz Umerle

    Institute of Literary Research - Polish Academy of Sciences

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Polish Literary Bibliography [PBL] is a large literary database containing app. 0,7 mln bibliographic records in a database form, and app. 1,5 mln records in a form of scanned pages of bibliography. It covers the period from 1944 to 2003 (database covers years 1989-2003).
It contains data on more than 40000 authors, 100000 works, and records close to 15000 cultural events, more than 10000 theatrical plays and radio plays and literary programmes, and 1000 films.
PBL has experienced two comprehensive remediations (Maryl and Wciślik, 2016; Bolter and Grusin, 2000) in recent decades. First, around 1998, it was transformed from a printed book into an online database (

) in open access. It was one of the earliest of such cultural databases in Poland, and it provided a stable environment for 20 years of continuous creation of rich bibliographic metadata. Yet, its Oracle-based production environment and its user interface were geared to faithfully represent the structure and layout of the bibliographic data of its printed predecessor, rather than to open for the possibilities of digitally-enabled data exploration, and as a result “it did not comply with any of the common standards in terms of record structure or data formats, what eventually lead to serious problems with both preservation and interoperability of collected data” (Maryl and Wciślik, 2006).

From 2015 PBL has been transformed into a new kind of service (

), in cooperation between the Centre of Digital Humanities of the Institute of Literary Research of the Polish Academy of Sciences, the Institute’s Department of Current Bibliography (creators of PBL), and IT specialists - Poznan Supercomputing and Networking Center. The redesign of the service was influenced by the research on users’ practices via Google Analytics (Werla, 2016) and their practices (more than 700 responses to the online questionnaire), combined with the strong conviction that the efficient search engine of the service should be complemented by the rich data-exploration experience. As Whitelaw argues, it is the “rich, browsable interfaces that reveal the scale and complexity of digital heritage collections” (Whitelaw, 2015; Cf. Bomba, 2017). The goal of the project team was to produce a service that can display not only the bibliographic records and  the contents of authoritative files in the most seamless way possible, but also to open up the database for advanced research uses.

The poster presentation will focus on the results of the second remediation of PBL -
the redesign of the bibliographic service, and the plans for its development. It will present  1.
the main characteristics of the new PBL service and 2.
the main challenges that the project team had to face and resolve that resulted from the complexities of the PBL dataset.

Design for rich data
In order to account for the complexities of PBL data, the new service includes:

Many access points that enable exploration of the complexities of the PBL dataset from different perspectives:

search engine and advanced search engine,
browsable finding aids that include a complex set of filters and facets; user can browse data starting with:

subject headings,
authoritative files: people, institutions, events, series, works,
other categories: books, films, plays, TV programmes etc.

Documentation describing PBL dataset in following manner:

the scope of dataset,
the organization of bibliographical work,
the sources used for creating metadata.

Provision for the use and reuse of the data in data-driven studies:

API module (with rich documentation), and possibility of CSV downloads (in progress),
Statistical module for data visualization (access via contact with PBL team):
Kibana (


The main challenges for the project team
The main challenges that the poster presentation will highlight are:

finding the best workflow for collaboration between digital humanities researchers, IT specialists and bibliographers,
striking the balance between the need of consistent structuring of data and the unstructured nature of annotations provided in specialised bibliography (insertion of tags with authoritative control into unstructured text was conceptualized),
data processing for the data migration process:

introducing modern authoritative control and linked open data mark-up (e.g. combining 4(!) personal authority files into one, and restructuring institutional and event files), including the implementation of VIAF IDs (in progress),
parsing the annotated data fields in order to structure them (in progress),

retrodigitization of 40000 pages of bibliographic records (the goal was to scan and OCR them, and work conceptually on the possibilities of converting the data into the database form).


Bolter, J. D. and Grusin, R. (2000).
Remediation: Understanding New Media. Cambridge: MIT Press.

Bomba, R. (2017). Bazodanowe interfejsy. Projektowanie interakcji z dużymi zasobami danych kulturowych.
Kultura Popularna,
1(50): 64-73.

Maryl, M. and Wciślik, P. (2016).
Remediations of Polish Literary Bibliography: Towards a Lossless and Sustainable Retro-Conversion Model for Bibliographical Data. In
Digital humanities 2016: Conference Abstracts. Kraków: Jagiellonian University & Pedagogical University, pp. 621-623.

Werla, M. (2016).
Serwis internetowy “Polska Bibliografia Literacka”. Analiza danych z Google Analytics za okres 7 miesięcy: od 1 kwietnia 2016 r. do 31 października 2016 r. Poznań,


(accessed 25 April 2019).

Whitelaw M. (2015). Generous Interfaces for Digital Cultural Collections.
Digital Humanities Quarterly 9(1),
http://www.digitalhumanities.org/dhq/vol/9/1/000205/000205.html (accessed 25 April 2019).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.