Constructing Scientific Archives that Support Humanistic Research

paper, specified "long paper"
  1. 1. Christopher Prom

    University of Illinois, Urbana-Champaign

  2. 2. Bethany Anderson

    University of Illinois, Urbana-Champaign

  3. 3. Thomas George Padilla

    Michigan State University

  4. 4. Angela Jordan

    University of Illinois, Urbana-Champaign

  5. 5. John Franch

    University of Illinois, Urbana-Champaign

  6. 6. Andrea Thomer

    University of Illinois, Urbana-Champaign

  7. 7. Tracy Popp

    University of Illinois, Urbana-Champaign

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction

The results of what has been termed the scientific method infuse every aspect of modern life. Science, as a contested enterprise, affects and is affected by political, economic, technological, social, cultural, religious, and ethical factors. Sensational cases illustrate that people are interested scientific data and the process that produces scientific knowledge.1 Scientific data, and evidence about how it was created, shaped, and interpreted, serve as resources for humanistic studies in disciplines including journalism,2 anthropology,3 history,4 medicine,5 and literary studies.6
If scientific information is to be useful for current and future scholarship, we must answer three fundamental questions: (1) Which information is preserved and made accessible? (2) What evidence does it provide? and (3) How can it be used? Future users of scientific records will provide conflicting answers: Archivists cannot anticipate the precise nature of future research, and selecting or arranging materials with the expectation of one particular use may preclude transformative uses.
If the goal of preservation is to make accessible a record of science that is amenable to disparate analyses and interpretations, information professionals must identify and preserve evidence regarding the process of scientific production, not just the results of the process. Given that each lab operates differently, preservation programs must use interdisciplinary modes of analysis and techniques, as must users of archives.
The fields of digital humanities, digital curation, data curation, and digital preservation have long been interdisciplinary, and are becoming increasingly hybrid in terms of the disparate theoretical frameworks and tools of analysis that they utilize. It may well be true that the lines between archives, libraries, digital humanities, and publishing are becoming more fluid, yet we believe that archives are well positioned to serve as centralized sites where these intersections lead to greater collaboration and knowledge dissemination.7
2. Academic Archives (AA)

Academic and research institutions have traditionally played a leading role in preserving the ‘papers’ (i.e. personal archives) of faculty, including scientists.8. Archival guides devoted to scientific and technical documentation emphasize that reports, correspondence, photographs, scrapbooks, lab records, and supplementary documentation hold as much continuing research value as formal research products (publications and processed datasets). Ideally, archivists act upon the basis of the core archival doctrine, respect des fonds: the idea that grouping records by the function or activity that led to their creation best preserves their value as evidence.9 To accomplish this, archivists draw upon concepts and techniques from other disciplines, to protect the records’ provenance and original order.
3. Digital Preservation (DP)

DP’s core concepts extend archival doctrine and practice. DP establishes requirements and processes for maintaining authentic and trustworthy digital objects.10 The Open Archival Information System Reference Model (OAIS) and Trusted Digital Repositories Framework provide recommendations concerning system design, policy development, and institutional commitments.11 Research regarding digital personal papers (including faculty archives) recommends particular acquisition, arrangement, descriptive, and access practices that preserve contextual information regarding the creation and use of digital records, such as documentation about the research environment and the use of communication/dissemination technologies.12
4. Data Curation (DC)

DC techniques complement DP and help scholars manage datasets of continuing informational value, by making them preservable, reusable, and computationally reproducible;13 and by developing “high-functioning” metadata.14 While DC provides a rich data-preservation toolkit, it focuses more attention on data than on ancillary documentation, which provides evidence of the ways in which that data was created, used, or interpreted. If the goal is to document the research environment and process, DC methods are but one (albeit, crucial) element in a broader archival strategy.
5. Anthropology

Documenting scientific activity, one of many social arenas in which knowledge is constructed, is best accomplished by direct observation. In the absence of ethnography, archives play an important documentary role by providing insight into the context underlying scientific fact creation. Anthropology’s focus on praxis and its reflexive methodological and epistemological underpinnings offer meaningful points of departure for archivists seeking to capture a nuanced record of scientific processes and activities.15
One goal of science is to produce authoritative, published documents: a materialized fact.16 The archival challenge lies in documenting social factors at play from data generation through the various stages of processing and interpretation, to the final “point of stabilisation.”17 Archives can lay bare the social factors that are all too easily stripped out when the fact is reified, allowing us to read against and along the scientific grain.18 As anthropological surrogates, archives must capture the processes and events that destroy and create facts.19
6. Digital Humanities (DH)

Network analysis, text mining, and information visualization provide an algorithmic supplement to the pursuit of past, present, and future research questions, enhancing analysis of digital objects.20 Recent work with digitized historical records demonstrates how the algorithmic approach offers a lens with which to gaze upon and ascertain meaning from large and unwieldy bodies of data.21 As the corpus of materials amenable to computational analysis grows in archives throughout the world, approaches of this type are becoming indispensable.
While great promise lies in the analytic potential of the DH, its practitioners are increasingly aware of the challenge to making the results of their research preservable and in the optimal case, reusable.22 For their part, the familiarity that archivists have with the lifecycle of data prepares them well for having conversations with digital humanists.23 Archivists can become digital humanists themselves as they utilize approaches like topic modeling to enhance the ways in which they, and their users, interact with archival materials.
7. General Application

An archival processing strategy that is informed by anthropological principles and that uses DH tools will facilitate reflexivity between the archivist’s professional practices and the scholar’s interpretive possibilities. Evidence of the knowledge production process can be better preserved by integrating DP, DC, and DH concepts and tools as essential elements of a systematic archival processing workflow covering analog, born-digital, and digitized records.24 Archivists should be particularly attuned to six elements of scientific work: inscription, circumstantiality, noise, conflict, credibility, and reification.25
To document these factors, archivists must modify how they appraise, preserve, arrange, and describe scientific records. The challenges to the archival task are many, and include:
Recognizing the potential of archival sources as anthropological surrogates.
Weighing the value of records in terms of the insight they reveal in terms of the six elements of scientific work that are listed above.
Using tools that preserve authenticity, while also providing scholars the ability to understand how the scientific process played out within the social networks and environment supporting scientific research.
Controlling costs and sustaining the archives over time.
8. Case Study

The strategy being used by the University of Illinois at Urbana-Champaign is illustrated by our ongoing work with the records of Carl Woese (1928-2012), 2003 winner of the Crafoord Prize in Biosciences.26 While a full description of the project is beyond the scope of an abstract, we are integrating DP, DC, and DH concepts and tools into three particular points in our workflow.27

Professionally photograph Woese’s laboratory to document microsocial environment.
Transfer materials while maintaining original order and recording original placement in lab.
Create forensic image of Woese’s laptop to capture records in a consistent, verifiable manner and avoid unintended data modification.28
Extract user files with disk analysis reports to use as surrogate during processing and topic modeling.
Used topic modeling and network analysis tools to help develop processing plans.
Arrange analog records into functional series representing Woese’s activities.
Preserve reprint files in original order, including correspondence regarding Woese’s methods and conclusions.
Generate preservation metadata for born-digital content, including genomic datasets.29
Use open-software and to identify and remove private/confidential records, and to identify documents speaking to the six factors, and to assist in generation of access copies, including topic model as alternate access point.30
Create a summary online description for all analog and digital files, with file-level inventory.31
Use file conversion and normalization tools to create access copies of the digital and digitized files.
Provide access copies in zip format, facilitating application of data analysis tools by scholars.
Present topic-modelled view of data as alternative access point.32
Deposit preserved records in Library’s digital preservation repository (Medusa), ensuring long term integrity and accessibility.33
Undertake migration assessment and planning for at risk file formats.
The process described above is surprisingly cost effective when integrated into the “More Product, Less Process” framework for achieving archival efficiency (more information and examples will be provided at the conference and in a full paper, if selected for inclusion in the proceedings).34
9. Conclusion

In the Woese project, we seek to test one means of preserving evidence about knowledge production in scientific archives. We believe that techniques like those described above can and should be applied to the archives of important scientists, if we wish for those archives to support humanistic research, but--as with all areas of human knowledge--we realize that our own conclusions are subject to revision after being viewed in the bright the light of experience.

1. Anthony A. Leiserowitz, Edward W. Maibach, Connie Roser-Renouf, Nicholas Smith, and Erica Dawson (2013), Climategate, Public Opinion, and the Loss of Trust, American Behavioral Scientist 57:6: 818-837.
2. Jeff Gerth and T. Christian Miller (2013), Use Only as Directed, Pro Publica, accessed October 31, 2013,
3. David Zeitlin (2012), Anthropology in and of the Archives: Possible Futures and Contingent Pasts. Archives as Anthropological Surrogates, Annual Review of Anthropology 41: 461-80; for a popular account using scientific archives, see David Grann, The Lost City of Z: A Deadly Tale of Obsession in the Amazon (New York: Vintage Books, 2010).
4. Harry Woolf, Manuscripts and the History of Science, ISIS 53 (March 1962): 3; Lillian Hoddeson, True Genius: The Life and Science of John Bardeen, the Only Winner of Two Nobel Prizes in Physics, (Washington, D.C.: Joseph Henry Press, 2002).
5. Ezra Susser, Hans W. Hoek, Alan Brown, Neurodevelopmental Disorders After Prenatal Famine: The Story of the Dutch Famine Study, American Journal of Epidemiology 147:3 (1998): 213-216. We are indebted to Tom Nesmith for this reference.
6. Lisa Yazek, Narrative, Archive, Database: The Digital Humanities and Science Fiction Scholarship, The Eaton Journal of Archival Research in Science Fiction 1:1 (April 2013): 8-13; Susan Haack, "Science, Literature, and The Literature of Science," The Humanities and the Sciences, American Council on Learned Societies Occasional Papers No. 47, 1999, accessed October 31, 2013,
7. Tanya Clement, Wendy Hagenmaier, and Jenny Levine Knies, Toward a Notion of the Archive of the Future: Impressions of Practice by Librarians, Archivists, and Digital Humanities Scholars, The Library Quarterly 83 (April 2013): 112-130.
8. Maynard J. Brichford, University Archives: Relationships with Faculty, The American Archivist 34, no. 2 (April 1, 1971): 173–181; William J. Maher, The Management of College and University Archives (Metuchen, N.J.: Society of American Archivists and Scarecrow Press, 1992), 27-28; Tom Hyry, Diane Kaplan, and Christine Weideman, “ ‘Though This Be Madness, Yet There Is Method in ‘T": Assessing the Value of Faculty Papers and Defining a Collecting Policy’,” The American Archivist 65, no. 1 (April 1, 2002): 56–69; Tara Zachary Lavar, “In a Class by Themselves: Faculty Papers at Research University Archives and Manuscript Repositories,” The American Archivist 66, no. 1 (April 1, 2003): 159-196.
9. James O’Toole and Richard J. Cox, Understanding Archives & Manuscripts. Archival Fundamentals Series. (Chicago, IL: Society of American Archivists, 2006), 87-131. Joan K. Haas, Helen W. Samuels, and Barbara Trippel Simmons, Appraising the Records of Modern Science and Technology: A Guide (Massachusetts Institute of Technology, 1985); Maynard J. Brichford, Scientific and Technological Documentation; Archival Evaluation and Processing of University Records Relating to Science and Technology (Urbana, IL: University of Illinois, 1969), 13-15.
10. InterPARES Project. The Long-term Preservation of Authentic Electronic Records: Findings of the InterPARES Project, no date.; Luciana Duranti, Preservation of the Integrity of Electronic Records. The Archivist’s Library v. 2. (Dordrecht; Boston: Kluwer Academic, 2002).
11. Consultative Committee for Space Data Systems. Reference Model for an Open Archival Information System (OAIS), January 2002, accessed October 31, 2013,; RLG/OCLC Working Group on Digital Archive Attributes. Trusted Digital Repositories: Attributes and Responsibilities (Mountain View, CA: Research Libraries Group, 2002), accessed October 31, 2013,
12. Paradigm Project. Workbook on Digital Private Papers, 2005, accessed October 31, 2013,; Jeremy Leighton John with Ian Rowland, Peter Williams, and Katrina Dean, Digital Lives: Personal Digital Archives for the 21st Century: An Initial Synthesis (The British Library, 2009), accessed October 31, 2013,; AIMS Work Group. AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship, 2012, accessed October 31, 2013, Tracey P. Lauriault, Barbara L. Craig, D. R. Fraser Taylor, and Peter Pulsifier, “Today’s Data Are Part of Tomorrow’s Research: Archival Issues in the Sciences,” Archivaria 64 (Fall 2007): 123-179.
13. Esther Conway, David Giaretta, Simon Lambert, and Brian Matthews, Curating Scientific Research Data for the Long Term: A Preservation Analysis Method in Context, International Journal of Digital Curation 6, no. 2 (July 26, 2011): 38-52, accessed October 31, 2013, doi:10.2218/ijdc.v6i2.204; Anne E. Thessen and David J. Patterson, “Data Issues in the Life Sciences,” ZooKeys no. 150 (November 28, 2011): 15–51, accessed October 31, 2013, doi:10.3897/zookeys.150.1766; Victoria C. Stodden, “Reproducible Research: A Digital Curation Agenda” (2011), accessed October 31, 2013,
14. Carole L. Palmer, Nicholas M. Weber, Trevor Munoz, and Allen H. Renear, Foundations of Data Curation: The Pedagogy and Practice of ‘Purposeful Work’ with Research Data, Archive Journal no. 3 (Summer 2013), accessed October 31, 2013,
15. Elisabeth Kaplan, ‘Many Paths to Partial Truths’: Archives, Anthropology, and the Power of Representation, Archival Science 2 (2002): 209-220.
16. Bruno Latour and Steve Woolgar, Laboratory Life: The Construction of Scientific Facts (Princeton: Princeton University Press, 1986), 50.
17. Ibid, 175-176.
18. See Nupur Chaudhuri, Sherry J. Katz, and Mary Elizabeth Perry, eds., Contesting Archives: Finding Women in the Sources (Urbana: University of Illinois Press, 2010) Ann Laura Stoler, Along the Archival Grain: Epistemic Anxieties and Colonial Common Sense (Princeton: Princeton University Press, 2009) and Caroline B. Brettell, Archives and Informants: Reflections on Juxtaposing the Methods of Anthropology and History, Historical Methods 25, no. 1 (Winter 1992): 28-36.
19. David Zeitlin, Anthropology in and of the Archives: Possible Futures and Contingent Pasts. Archives as Anthropological Surrogates, Annual Review of Anthropology 41 (2012): 461-80.
20. D. Sculley, and Bradley Pasenek, Meaning and mining: the impact of implicit assumptions in data mining for the humanities, Literary and Linguistic Computing. no. 4 (2008): 409-424, accessed October 31, 2013, doi:10.1093/llc/fqn019 David Blei, Probabilistic Topic Models, Communications of the ACM no. 4 (2012): 77-84, accessed October 31, 2013, doi:10.1145/2133806.2133826 Andrew Goldstone and Ted Underwood, What Can Topic Models of PMLA Teach Us About the History of Literary Scholarship? Journal of Digital Humanities. no. 1 (2012), accessed November 1, 2013,
21. Robert Nelson, Mining the Dispatch. University of Richmond, Elijah Meeks. Karl Grossner, “Developing Kindred Britain”. Stanford University
22. Julia Flanders, Trevor Munoz, An Introduction to Humanities Data Curation.
23. Alex H. Poole, Now is the Future Now? The Urgency of Digital Curation in the Digital Humanities, Digital Humanities Quarterly 7 (2013), accessed March 5, 2014,
24. J. Gordon Daines III, Processing Digital Records Manuscripts, in Archival Arrangement and Description. eds. Christopher J. Prom and Thomas J. Frusciano. Trends in Archives Practice Series (Chicago: Society of American Archivists, 2013), 87-144.
25. Latour and Woolgar (1986), 236-244.
26. Carl Woese, Wikipedia, accessed October 29, 2013,
27. See Christopher Prom, Making Digital Curation a Systematic Institutional Function, International Journal of Digital Curation 6, no. 1 (August 3, 2011): 139-152, accessed October 31, 2013, doi:10.2218/ijdc.v6i1.178; links from Staff Resources, University of Illinois Archives website,
28. Forensic Toolkit Imager (FTK Imager):
29. NARA File Anaylzer and Metadata Harvester: DROID:; Treesize Pro:; Karen’s Directory Printer:
30. Thomas Padilla, Topic Modeling Archival Materials, Practical E-Records Blog, accessed November 1, 2013,; Archon:
Our current descriptive catalog/software is Archon:; we are migrating to ArchivesSpace,; A description of the Woese Papers is available at
32. David Blei, Andrew Ng, and Michael Jordan, Latent Dirichlet Allocation, Journal of Machine Learning Research (2003): 993-1022; Robert Nelson, University of Richmond, Mining the Dispatch, accessed October 31, 2013,; Lisa Rhody, Topic Modeling and Figurative Language, Journal of Digital Humanities. no. 1 (2012), accessed October 31, 2013,
34. Mark Greene and Dennis Meissner, More Product, Less Process: Revamping Traditional Archival Processing, The American Archivist 68, no. 2 (Fall/Winter 2005): 208-263.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO