Establishing a Code Review Community for DH

Julia Damerow; Rebecca Sutton Koeser; Andrew Gao; Malte Vogl; Itay Zandbank; Jeffrey Tharsen; Robert Casties; Kalle Westerling; Jeffrey Carver

Authorship

1. Julia Damerow

Arizona State University
2. Rebecca Sutton Koeser

Princeton University
3. Andrew Gao

Canyon Crest Academy
4. Malte Vogl

Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte
5. Itay Zandbank

The Research Software Company
6. Jeffrey Tharsen

University of Chicago
7. Robert Casties

Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte
8. Kalle Westerling

British Library
9. Jeffrey Carver

University of Alabama

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Many digital humanities projects require custom software development. The people doing that work, who may be researchers, professional software developers, or someone in between, write software to achieve a project’s goals. But who is testing and reviewing that software to confirm it works properly? No matter how experienced or well-trained a programmer is, there will inevitably be errors in the produced code. A rough estimate suggests that there are between 15 to 50 errors in 1000 lines of code written by professional software developers (Soergel 2015). While not all errors affect the research findings based on the code, it is possible, and there are plenty of cases where this has happened (see for example Letzner et al. 2020 or Miller 2006). Furthermore, uncaught “edge cases” could drastically affect future researchers’ results.

Code review is a widespread technique to improve software and reduce the number of flaws. In a code review, a programmer (other than the original code author(s)) reviews the source code. They ask questions and make suggestions for improving the software. In addition to identifying and eliminating errors, code review can improve overall quality by making the source code more readable and maintainable. Furthermore, code reviews can improve not just the skills of the reviewee but also those of the reviewer. If a code author and reviewer work in the same team or on the same or related projects, code reviews can also support team cohesion and facilitate information-sharing within a team.

Code reviews are fairly easy to implement in teams of two or more developers, where there is a shared context, technical stack, and agreed upon conventions. However, in digital humanities projects, often there is just the one “techy” person who does all the coding with no colleague to review their code. Given the prevalence of virtual communication platforms like Slack and Github, there is no reason that code review may only happen internally at a single lab. Rather, programmers across labs/centers may review each other’s code. At the ACH 2021 conference, a group of people organized a workshop to discuss and develop ideas and strategies for a community code review process for digital humanities. The outcome of the workshop was a working group as part of the
ADHO SIG DHTech that meets monthly with the goal of building a community of people and a community code review infrastructure.

A community code review system would provide developers writing code for digital humanities projects with a way to ensure the quality of their code. Similarly, it would give researchers or developers reusing code reviewed software some insurance that a program generates trustworthy results. It would also fill a gap in the current publishing landscape that consists of journals like the Journal of Open Source Software (https://joss.theoj.org/) that provides ways for developers to publish about the software they create. These journals typically require software to be “feature-complete” and ideally reusable. They check that certain best practices are followed (such as providing installation instructions or API documentation) but for good reason a full-fledged code review is usually not possible. Additionally, a community code review system would provide graduate students doing computational research in their dissertation but who do not have a technical reader on their committee, or researchers who begin using computational methods with a way to get feedback on their programming work. This proposal is for a virtual poster that describes the work of the working group and its goals.

Bibliography
Letzner, S., Güntürkün, O., and Beste, C. (2020). Retraction Notice to: How Birds Outperform Humans in Multi-Component Behavior.
Current Biology.

https://www.cell.com/current-biology/comments/S0960-9822(17)30960-0
. Accessed December 8. 2021.

Miller, G. (2006). A Scientist’s Nightmare: Software Problem Leads to Five Retractions.
Science 314 (5807): 1856–57.

Soergel, D. A. W. (2015). Rampant Software Errors May Undermine Scientific Results.
F1000Research 3 (303).
https://doi.org/10.12688/f1000research.5930.1.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022

"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO