Arizona State University
Princeton University
Canyon Crest Academy
Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte
The Research Software Company
University of Chicago
Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte
British Library
University of Alabama
Many digital humanities projects require custom software development. The people doing that work, who may be researchers, professional software developers, or someone in between, write software to achieve a project’s goals. But who is testing and reviewing that software to confirm it works properly? No matter how experienced or well-trained a programmer is, there will inevitably be errors in the produced code. A rough estimate suggests that there are between 15 to 50 errors in 1000 lines of code written by professional software developers (Soergel 2015). While not all errors affect the research findings based on the code, it is possible, and there are plenty of cases where this has happened (see for example Letzner et al. 2020 or Miller 2006). Furthermore, uncaught “edge cases” could drastically affect future researchers’ results.
Code review is a widespread technique to improve software and reduce the number of flaws. In a code review, a programmer (other than the original code author(s)) reviews the source code. They ask questions and make suggestions for improving the software. In addition to identifying and eliminating errors, code review can improve overall quality by making the source code more readable and maintainable. Furthermore, code reviews can improve not just the skills of the reviewee but also those of the reviewer. If a code author and reviewer work in the same team or on the same or related projects, code reviews can also support team cohesion and facilitate information-sharing within a team.
Code reviews are fairly easy to implement in teams of two or more developers, where there is a shared context, technical stack, and agreed upon conventions. However, in digital humanities projects, often there is just the one “techy” person who does all the coding with no colleague to review their code. Given the prevalence of virtual communication platforms like Slack and Github, there is no reason that code review may only happen internally at a single lab. Rather, programmers across labs/centers may review each other’s code. At the ACH 2021 conference, a group of people organized a workshop to discuss and develop ideas and strategies for a community code review process for digital humanities. The outcome of the workshop was a working group as part of the
ADHO SIG DHTech that meets monthly with the goal of building a community of people and a community code review infrastructure.
A community code review system would provide developers writing code for digital humanities projects with a way to ensure the quality of their code. Similarly, it would give researchers or developers reusing code reviewed software some insurance that a program generates trustworthy results. It would also fill a gap in the current publishing landscape that consists of journals like the Journal of Open Source Software (https://joss.theoj.org/) that provides ways for developers to publish about the software they create. These journals typically require software to be “feature-complete” and ideally reusable. They check that certain best practices are followed (such as providing installation instructions or API documentation) but for good reason a full-fledged code review is usually not possible. Additionally, a community code review system would provide graduate students doing computational research in their dissertation but who do not have a technical reader on their committee, or researchers who begin using computational methods with a way to get feedback on their programming work. This proposal is for a virtual poster that describes the work of the working group and its goals.
Bibliography
Letzner, S., Güntürkün, O., and Beste, C. (2020). Retraction Notice to: How Birds Outperform Humans in Multi-Component Behavior.
Current Biology.
https://www.cell.com/current-biology/comments/S0960-9822(17)30960-0
. Accessed December 8. 2021.
Miller, G. (2006). A Scientist’s Nightmare: Software Problem Leads to Five Retractions.
Science 314 (5807): 1856–57.
Soergel, D. A. W. (2015). Rampant Software Errors May Undermine Scientific Results.
F1000Research 3 (303).
https://doi.org/10.12688/f1000research.5930.1.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Tokyo, Japan
July 25, 2022 - July 29, 2022
361 works by 945 authors indexed
Held in Tokyo and remote (hybrid) on account of COVID-19
Conference website: https://dh2022.adho.org/
Contributors: Scott B. Weingart, James Cummings
Series: ADHO (16)
Organizers: ADHO