Data-First Digital Humanities: How Adopting a Data-First Strategy Fosters Research, Collaboration, Pedagogy, and Scholarly Communication in the Digital Humanities

poster / demo / art installation
  1. 1. Todd Hughes

    Vanderbilt University

  2. 2. Clifford Anderson

    Vanderbilt University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We propose a strategy for conducting digital humanities teaching and research that prioritizes publishing data above all other project activities. Drawing on our experience working with faculty, librarians, and graduate students on a critical edition in TEI of Charles Baudelaire’s
Les Fleurs du Mal, we demonstrate how adopting a data-first strategy fosters research, collaboration, pedagogy and scholarly communications in the digital humanities.

Corpus Baudelaire Project began at Vanderbilt University in 2013, when a hybrid group of approximately ten scholars, who had recently learned how to encode literary texts in the TEI, aspired to do something practical with their new skills. The group developed a connection to Vanderbilt University Library’s W. T. Bandy Center for Baudelaire and Modern French Studies;  who exhaustively collects Baudelaire’s works, including
Les Fleurs du Mal.  The work itself was published in four editions: 1857, 1861 (containing 35 additional poems, the
Tableaux parisiens, and lacking six poems censored by the Second Empire), 1866 (including
Les Epauves or
The Scraps, and the six poems missing from the 1861 edition), and the posthumous 1886 edition. Participants in the
Corpus Baudelaire Project are encoding all the editions using the critical edition apparatus in the TEI.

We describe our data-first approach to
Corpus Baudelaire Project, which minimizes otherwise common tasks such as developing databases or coding interfaces, and argue for its advantage over alternative approaches in fostering collaboration, pedagogy, and new forms of publishing. We also suggest that our data-first approach may also productively be generalized to any digital humanities projects developing significant quantities of data.

A data-first approach differs from other forms of digital humanities scholarship by minimizing startup costs and reducing complexity. Whereas digital humanities projects aim above all to produce some form of online digital edition or interactive website, a data-first approach invests primarily in producing and sharing data with others. “It’s the data, stupid!” is our informal slogan.
A data-first approach to DH involves at least three steps: licensing, curating, and publishing datasets online. The second two steps are likely to be iterative and emergent.

Licensing. A data-first approach begins with the presupposition of making data openly available and reusable by other scholars. This not only implies attaching an open source license to the data, but also making certain that participants in the project understand the license and agree with its terms.
Curating. A data-first approach implies that discussions about data curation start at the beginning of the project, not its end. How shall information be encoded? How to decide between alternative options? Are there emerging best practices and converging forms of representation? Documenting data and making available any accompanying schemas is also critical when taking a data-first approach.
Publishing. A data-first approach requires that data be published for comment, criticism and reuse from the onset of the project. What are the best platforms for publishing digital humanities data? How can digital humanists provide access and get credit for their data?

By prioritizing these three activities above other forms of digital humanities, we simultaneously lower the barriers for participants to join our project while offering them the opportunity to publish and begin receive credit for their work almost immediately. Crucially, credit is allocated with respect to contributions, not by seniority or other hierarchical designations; the data bear witness directly to their creators.


Vishwas, Ch. and Penev, L. (2011). The data paper: a mechanism to incentivize data publishing in biodiversity science.
BMC bioinformatics, 12.15 (2011): 1.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website:

Series: ADHO (11)

Organizers: ADHO