All papers are data papers: from open principles to digital methods

paper, specified "long paper"
  1. 1. Arthur Perret

    laboratoire MICA E3D - L'Université Bordeaux Montaigne (Bordeaux Montaigne University)

  2. 2. Olivier Le Deuff

    laboratoire MICA E3D - L'Université Bordeaux Montaigne (Bordeaux Montaigne University)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

How do we bridge the gap between ambitious global schemes, such as Paul Otlet’s “Aims of documentation” (Otlet, 1934) or the FAIR data principles (Wilkinson et al., 2016), and existing information practices? We describe the theoretical basis and practical steps for a subject-oriented approach to this problem, examining data-related expectations through the lens of documentarity.In 1934, Belgian bibliographer Paul Otlet published a Treaty of documentation in which he outlined the “Aims of documentation”:“Universal as to their purpose; reliable and true; complete; fast; up to date; easy to obtain; collected in advance and ready to be communicated; made available to the greatest number of people.” (Otlet, 1934, p. 6)In 2016, the FAIR principles were published along similar lines:“To be Findable; to be Accessible; to be Interoperable; to be Reusable.” (Wilkinson et al., 2016, p. 4)They differ in some ways: Otlet viewed the Aims as a whole, with openness as a critical element, while FAIR is modular and not necessarily synonymous with open data. But more importantly, they both describe a plan which is meant to precede and guide implementation. Otlet’s Aims are broken down into goals related to the actual “biblio-technie” or “bibliothéconomie” (Otlet, 1934, pp. 372–375); similarly, each of the 4 components of FAIR is itself divided in 4 sub-components which delve into technical matters (e.g. data vs. metadata). These are actionable steps to be applied in the field, which is where trouble begins.During and after his time, close collaborators and distant peers alike noted the gap between Otlet’s ambitions and what he was able to achieve: Valère Darchambeau commented on “Mr. Otlet’s mental audacities, his utopias some would say” (Mundaneum archives, PP P0 462); Suzanne Briet called him ironically “the magus” of documentation. Indeed, he had a major impact on the institutionalization of documentation—the development of Library and information science (LIS) in Europe owes much to section 4 of his Treaty—but his work on the relationship between subject and knowledge was largely neglected. The techno-semiotic mediations of information have been far less studied in LIS than human ones; we can arguably trace this back to Otlet’s incomplete legacy. Conversely, the implementation of FAIR principles quickly raised the issues of user experience, expectations and metrics:“FAIRness is aspirational, yet the means of reaching it may be defined by increased adherence to measurable indicators . . . metrics that reflect the expectations of particular communities.” (Wilkinson et al., 2017, pp. 1–2)The interface between person and information seems much thinner for computer-held data than for library books. While this is not actually true (mediations have simply shifted towards human-computer interaction), it means that the feasibility of principles is challenged almost immediately by subjective experience. Data may be FAIR but people may differ: they do not all work on the same data or with the same mindset and therefore have different expectations. This shapes the way we assess data within the framework of documentation and therefore its value to us—its documentarity.Documentarity is the product of interdisciplinary theoretical work, at the intersection between ontology, documentation and linguistics. The first of these two influences have been studied: documentarity can be seen as a philosophy of evidence based on documentation (Day, 2019) and also as the quantifiable documentary quality of things, with applications to digital documents and data (Perret & Le Deuff, 2019). Here, we examine the third influence: how linguistics contribute to documentarity as an epistemological proposal which at the core focuses on the reception of information. We show that documentarity is linked to several works: Roman Jakobson’s “literaturnost”, which in French (“littérarité”) (Jakobson, 1977, p. 16) is very close to documentarity (“documentarité”); Hans Robert Jauss’ adaptation of horizons of expectation (“Erwartungshorizont”) to literature (Jauss, 1970); the shape of enunciation with Mary-Ann Caws “architexture” (Caws, 1981, p. 10) and Roger Laufer’s “scripturation” (Laufer, 1986, p. 75).This array of concepts is dense but its purpose is coherent: we draw from the phenomenology of the reading process to make better sense of the way we assess computer-held data. Our methodology is to track the embodiment of thought in technological mediations, especially in writing. The usefulness of such an approach has been described for the study of information as experience (Gorichanaz, 2017). We argue that the way we perceive the documentarity of data is shaped by our horizons of expectation, especially previous experience of genre-based rules which me must establish if we wish to prevent global principles from falling into abstraction as soon as they enter the field.In this perspective, digital notebooks form a stimulating case study, highly relevant to the conference’s theme on open data. They relate to a tradition and to new practices (data science, data papers). We analyze the way data is presented and interacted with in R, Python and Javascript-based notebooks, and we observe a reflexive impact on our perception of documentarity: it allows us to relate more practically to the intellectual framework behind Otlet’s “Aims of documentation” and the FAIR principles, which could improve their adoption. Through reproducibility and replicability, the practice of the notebook informs us on the relationship between data and truth. It also underlines the status of text as the most basic and universal type of data in science: the way text is handled in notebooks (lightweight markup languages, integration of standards, automation) shifts our perception of ‘text’ to ‘textual data’. This is independent from the field of study: we suggest that any research built from plain text can be considered a data paper and that extending “FAIRness” to scientific writing in general would be an epistemological breakthrough in scientific communication.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO