Digital Humanities and Replication. Ingredients for a Love Story – Experiences from the ‘(Re)counting the Uncounted’ Project

paper, specified "long paper"
Authorship
  1. 1. Rombert Stapel

    International Institute of Social History - Royal Netherlands Academy of Arts and Sciences (KNAW)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
Over the past years much has been written about the replication crisis in science, with humanities research slowly, and sometimes hesitantly, catching up (Peels and Bouter, 2018a; Rijcke, de and Penders, 2018; Peels and Bouter, 2018b). Strongly related to this development, is the strive for open science (UNESCO, 2021). By facilitating and compelling researchers (through research grant stipulations) to not only publish their results, but also to provide access to their data and methodologies, a major prerequisite for replication is met.

A second essential perquisite for replication is making it worthwhile for researchers to spend their time to replication – by opening up grant opportunities, value replication as an integral part of in scholarly careers, etc.
The FAIR principles to data management (Findable, Accessible, Interoperable, Reusable) are central to this approach (Wilkinson et al., 2016).

Current literature on the relationship between digital humanities and the replication of research to a large extent focuses on the challenges of replicating digital, or more broadly speaking, quantitative-based humanities research (Tucker, 2017; Flis, 2018). In computational linguistics discussions have taken off, not least in response to a recent thought-provoking article by Nan Z. Da (Da, 2019; Algee-Hewitt et al., 2019; Arnold and Buell, 2019; Antoniak et al., 2020).
Rather than presenting another gloomy narrative on the replicability of digital humanities research, in this paper we want to address the opportunities that digital humanities methodologies offer in mediating the ‘replication crisis.’ Although digital humanities are sometimes narrowly defined, barely interacting with quantitative historical research for instance, we will use it as a broad umbrella term for humanities research that systematically makes use of and analyses digital resources. Such research almost always includes some form of empirical analysis, thus lending itself to replication as well (Peels, 2019).

Background of the project
Such an approach is central to the
(Re)counting the Uncounted project, the first humanities project funded through the Dutch Scientific Council’s Replication Studies Program – unique in the world.

https://www.nwo.nl/en/researchprogrammes/replication-studies.
In this project, four seminal studies which have estimated the medieval and early-modern population in the Netherlands and Belgium are formally replicated (Faber et al., 1965; Blockmans et al., 1980; Klep, 1991; Paping, 2014). This replication is performed by using the same underlying data, many hundreds of premodern censuses, and by applying the same methodologies: usually multiplying the units of the actual censuses – typically hearths, houses, chimneys, families, communicants, able-bodied men, and sometimes individuals – with predetermined coefficients, while accounting for those who are excluded from the census (for reasons of fiscal exemption for instance) (for an introduction to the challenges with these types of sources: Arnould, 1976).

The
(Re)counting the Uncounted project aims to test the consistency of the methodologies applied in the four studies by using digital humanities methods as a feedback loop. For this purpose, we have a twofold approach. First, we digitise and contextualise the
unaggregated – i.e. on village-level – premodern censuses. These statistics are then linked to specially prepared Historical GIS maps of locality boundaries in the Netherlands, Belgium, Luxembourg, and surrounding areas (Stapel, 2020). Second, there is the problem of the diverging nature of the censuses. These censuses have been created by a multitude of actors, in a multitude of territories, for a multitude of purposes, counting a full range of units, and across nearly five centuries. To exacerbate the problem of the comparability of their results, modern scholarship has never been able to reach consensus on which coefficients should be used in what circumstances (e.g.: Blockmans et al., 1980: 42–43; Stabel, 1997: 19 ff.; Hélin, 1963: 41 ff.; Brouwer, de, 1963; Cloet, 1966; Woude, van der, 1972: 77–91).

By carefully building not only a database of digitised censuses, but also a full record of contextual information on each census (what is counted, who is counted, by whom is the census made, for what purpose, etc.), it becomes possible to analyse this contextual information, for instance to create more consistent coefficients. Moreover, it will become possible, through the advancements of GIS techniques, to contextualise the (socio)geographical context of a locality mentioned in one of these medieval or early modern censuses (e.g.: Stapel, 2017: 182).

Facilitating replicability through database design
Thus far, we have focused on the set up of this humanities replication study. The set up however has much wider implications for how, in this case, historical statistical databases should be constructed in our opinion. Here too, digital humanities techniques, although not uniquely developed for or within a humanities context, play an essential role.
A traditional database of historical statistics – again we use this type of database as an example, but it can be applied much broader in humanities research as well – involves a database of, typically, rows and columns – mimicking the printed table well-known in scholarly literature for centuries. Rather than putting the rows and columns at the forefront, and defining them, we aim to put the data observation central. Every data observation is linked to contextual information, which may also include specific information usually stored in table footnotes, and will be geographically defined in GIS. We will use Linked Open Data (LOD) to facilitate this database structure.
One may argue, with reason, that this approach is not very new. Yet, the approach is still very uncommon in quantitative humanities research and rarely applied in full. Building from our experiences in the
(Re)counting the Uncounted project, we will also show that the amount of time needed to invest in such a data model should not necessarily be an obstacle, nor should lack of access to user-friendly methods to set this data model up. In its very core, the
(Re)counting the Uncounted project, while promoting all four FAIR principles, aims above all to improve the interoperability of through-and-through messy historical data in this way.

Moreover, and this is also a vital element, in relation to the replication aspect, the contextual and geographical information stored with every data observation can be further distributed to users of the data, creating a new level of transparency. After all, replication does not end with the publication of new results, but involves an ongoing conversation (Peels, 2019). Facilitating the replicability of replicated research is essential. We will exploit the possibilities of LOD to create a crumb trail from an observation in a source (a table in literature, an (image of a) archival document, etc.), via a range of carefully defined interpretations of that observation (either by existing scholars or by future users), to a scientific product: in our case population estimates based on very distinct types of sources.
Finally, in order to facilitate source critical attitudes of the users, we aim to grant access to our (open) data through dynamic questionnaires. These have the purpose of bringing any user up to speed with the specific challenges of our source material, forcing them to think about how these challenges affect their research question. Downloading the aggregated end results in a CSV without ever considering how the data came into existence – a common research practice, at least in quantitative history – is actively discouraged in this way.

Acknowledgments
Co-author to the paper and LOD model is Ivo Zandhuis (Fellow at the International Institute of Social History and independent researcher and consultant at ivozandhuis.nl). This publication is part of the project ‘(Re)counting the Uncounted’ (with project number 401.19.038 of the research programme Replication Studies which is (partly) financed by the Dutch Research Council (NWO).

Bibliography

Algee-Hewitt, M. A., Bode, K., Brouillette, S., Finn, E., Klein, L., Long, H., Piper, A., Underwood, T., Da, N. Z. and Fish, S. (2019). Computational Literary Studies: A Critical Inquiry Online Forum
Critical Inquiry
https://critinq.wordpress.com/2019/03/31/computational-literary-studies-a-critical-inquiry-online-forum/.

Antoniak, M., Jannidis, F., Mimno, D., Schöch, C. and Dalen-Oskam, K. van (2020). Replication and Computational Literary Studies.
DH2020. Ottawa: ADHO doi:http://dx.doi.org/10.17613/ekd2-ew51. https://hcommons.org/deposits/item/hc:30439/ (accessed 10 December 2021).

Arnold, T. and Buell, R. (2019). More Responses to ‘The Computational Case against Computational Literary Studies’
Critical Inquiry
https://critinq.wordpress.com/2019/04/12/more-responses-to-the-computational-case-against-computational-literary-studies/.

Arnould, M.-A. (1976).
Les Relevés de Feux. (Typologie Des Sources Du Moyen Âge Occidental 18). Turnhout: Brepols.

Blockmans, W. P., Pieters, G., Prevenier, W. and Van Schaïk, R. W. M. (1980). Tussen crisis en welvaart: sociale veranderingen 1300-1500. In Blok, D. P. (ed),
Algemene Geschiedenis Der Nederlanden, vol. 4. Haarlem: Fibula-Van Dishoeck, pp. 42–86.

Brouwer, J. A. K. de (1963). Het belang van de kommunikantencijfers en de verhouding ervan tot de bevolking.
Handelingen van de Koninklijke Zuidnederlandse Maatschappij Voor Taal- En Letterkunde En Geschiedenis,
17: 67–80.

Cloet, M. (1966). De leeftijdsgrens tussen communicanten en niet-communicanten in de XVIIde en de XVlIlde eeuw.
De Leiegouw,
8(2): 451–71.

Da, N. Z. (2019). The Computational Case against Computational Literary Studies.
Critical Inquiry,
45(3): 601–39 doi:10.1086/702594.

Faber, J. A., Roessingh, H. K., Slicher van Bath, B. H., Van der Woude, A. M. and Xanten, H. J. van (1965). Population changes and economic developments in the Netherlands: a historical survey.
A.A.G. Bijdragen, vol. 12. Wageningen: Afdeling Agrarische Geschiedenis, Landbouwhogeschool, pp. 47–113.

Flis, I. (2018). Digital humanities as the historian’s Trojan horse: Response to commentary in the special section on digital history.
History of Psychology,
21(4): 380–83 doi:10.1037/hop0000113.

Hélin, É. (1963).
La Démographie de Liége Aux XVIIe et XVIIIe Siécles. (Académie Royale de Belgique. Classe Des Lettres et Des Sciences Morales et Politiques. Mémoires, Coll. in-8° 56/4). Brussels: Palais des Académies.

Klep, P. M. M. (1991). Population Estimates of Belgium, by Province (1375-1831). In Société Belge de Démographie (ed),
Historiens et Populations. Liber Amicorum Étienne Hélin. Louvain-la-Neuve: Academia, pp. 485–507.

Paping, R. F. J. (2014). General Dutch Population development 1400-1850: cities and countryside. Alghero, Italy
http://hdl.handle.net/11370/d057464a-dbb1-4d50-a217-762403c1a3e2.

Peels, R. (2019). Replicability and replication in the humanities.
Research Integrity and Peer Review,
4(1): 2 doi:10.1186/s41073-018-0060-4.

Peels, R. and Bouter, L. (2018a). Humanities need a replication drive too.
Nature,
558(7710): 372–372 doi:10.1038/d41586-018-05454-w.

Peels, R. and Bouter, L. (2018b). The possibility and desirability of replication in the humanities.
Palgrave Communications,
4(1): 95 doi:10.1057/s41599-018-0149-x.

Rijcke, S. de and Penders, B. (2018). Resist calls for replicability in the humanities.
Nature,
560(7716): 29–29 doi:10.1038/d41586-018-05845-z.

Stabel, P. (1997).
Dwarfs among Giants: The Flemish Urban Network in the Late Middle Ages. (Studies in Urban Social, Economic and Political History of the Medieval and Modern Low Countries 8). Leuven: Garant.

Stapel, R. J. (2017). Holland rond 1500: een geografische verkenning van de
Enqueste (1494) en
Informacie (1514).
Holland: Historisch Tijdschrift,
49(4): 177–84.

Stapel, R. J. (2020). Historical Atlas of the Low Countries (1350-1800) IISH Data Collection http://hdl.handle.net/10622/PGFYTM (accessed 10 June 2020).

Tucker, A. (2017). Replication, Visualization & Tactility: Towards a Deeper Involvement of 3D Printing in Humanities Scholarship and Research. Montreal: ADHO https://dh2017.adho.org/abstracts/230/230.pdf.

UNESCO (2021). UNESCO Recommendation on Open Science
UNESCO
https://en.unesco.org/science-sustainable-future/open-science/recommendation.

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship.
Scientific Data,
3(1): 160018 doi:10.1038/sdata.2016.18.

Woude, A. M. van der (1972).
Het Noorderkwartier. Een Regionaal Historisch Onderzoek in de Demografische En Economische Geschiedenis van Westelijk Nederland van de Late Middeleeuwen Tot Het Begin van de Negentiende Eeuw. Vol. 1. 3 vols. (A.A.G. Bijdragen 16). Wageningen: Afdeling Agrarische Geschiedenis, Landbouwhogeschool.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO