Processes and Practicalities in Developing and Sustaining a Text Mining Platform: Gale Digital Scholar Lab

paper, specified "short paper"
  1. 1. Sarah Ketchley

    University of Washington/Gale, United States of America

  2. 2. Jess Ludwig


Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Gale Digital Scholar Lab was developed in 2018 to fulfil requests for a platform for text mining primary source documents without necessarily having to learn to code in Python or R. Based on beta-testing user interviews, it was determined that some of the most significant barriers to entry into the field of text-based digital humanities data mining include not knowing how or where to start in order to build a DH project, not having time to gather a significant data set or to clean and organize data for analysis, and having limited institutional infrastructure and support for projects that include text mining methodologies. In designing the DS Lab, the goal was to provide a scaffolded experience for users new to the field of digital humanities, while offering options for extensibility for researchers with established projects. This included providing pathways for research, teaching and learning by both students, faculty, and librarians.
The DS Lab has been iteratively developed since its first release, with updates including tool enhancements and support for pedagogical use of the platform, and more recently a platform migration, workflow tweaks and improved accessibility. The DS Lab integrates six GUI-based tools for conducting text analysis of primary source archives and user-uploaded plaintext documents. These tools comprise Named Entity Recognition, Sentiment Analysis, Ngrams, Parts of Speech Tagging, Topic Modeling and Clustering. Recognizing that quality of OCR text is key in achieving meaningful analysis outputs, the DS Lab also presents options for text cleaning as part of the curation process. Import and export of text and metadata are also supported.
To orient users who are new to the field of DH to the workflow and outcomes, the platform incorporates an extensive Learning Center with contextual help documentation including brief recorded videos, images, text, and sample projects. Similarly, for teachers who are looking for ideas or additional support in the platform, there are draft syllabi, outline learning objectives, and downloadable project outlines.
This 10-minute talk will focus on describing the process and challenges of developing the DS Lab interface, meeting the often-competing demands of balancing developer time, the scope of individual project sprints, and projected cost. Consideration will be given to the workflows which were successful as well as those that needed to be adapted or scrapped altogether. It will discuss using personas to design the features and functionality in the platform, and the advantages and drawbacks of doing so. The development of the Learning Centre is a case in point, since its development drew on a range of internal and external expertise such as academic advisors, curriculum developers, and UX designers as well as in-house software and content engineers, metadata and content architects and the product and archives team. This collaborative undertaking took considerable management to balance expectations and outcomes, and to ensure that communication flowed clearly. These considerations are not unique to the Gale Digital Scholar Lab development project but can be extrapolated to other similar DH projects. The intent of the talk is to highlight how the lessons learned by the Gale development team and external stakeholders can be used to provide guidance for others.

Besette, Lee. (2012). “Challenges in Digital Humanities.”
Inside Higher Ed. Accessed 16 March 2022.
Campese, C., Thiago Bertolini, d. S., Lorena Pereira, d. C., & Janaina Mascarenhas, H. C.
User stories method and assistive technology product development: A new approach to requirements elicitation. Cambridge: Cambridge University Press. doi:

Coutu, Diane. (2015). “Why Teams Don’t Work.”
Harvard Business Review. Accessed 25 Apr. 2022.

Currier, Brent D. (2017). “They Think all of this is new: Leveraging Librarians’ Project Management Skills for the Digital Humanities.”
College & Undergraduate Libraries 24, 270-289.

Dingsøyr, Torgeir, et al. (2018). “Coordinating Knowledge Work in Multiteam Programs: Findings From a Large-Scale Agile Development Program.”
Project Management Journal, vol. 49, no. 6. 64–77, doi:10.1177/8756972818798980.

Gratton, Lynda, and Tamara J. Erickson. (2016). “Eight Ways to Build Collaborative Teams.”
Harvard Business Review. Accessed 25 Apr. 2022.

Jenkins, Nick. (2008)
A Software Testing Primer An Introduction to Software Testing. San Francisco: Creative Commons.

Nielsen, Jakob. (nd). “Why You Only Need to Test with 5 Users.”
Nielsen Norman Group., Accessed 30 Apr. 2022.

Siemens, L. (2011). The Balance between On-line and In-person Interactions: Methods for the Development of Digital Humanities Collaboration. 
Digital Studies/le Champ Numérique, 
2(1). DOI: Accessed 16 March 2022.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO