Tasks vs. Roles: A Center Perspective on Data Curation Needs in the Humanities

Trevor Muñoz; Virgil Varvel; Allen H. Renear; Kevin Trainor; Molly Dolan

Authorship

1. Trevor Muñoz

Center for Informatics Research in Science and Scholarship - University of Illinois, Urbana-Champaign
2. Virgil Varvel

Center for Informatics Research in Science and Scholarship - University of Illinois, Urbana-Champaign
3. Allen H. Renear

Brown University, Center for Informatics Research in Science and Scholarship - University of Illinois, Urbana-Champaign
4. Kevin Trainor

Center for Informatics Research in Science and Scholarship - University of Illinois, Urbana-Champaign
5. Molly Dolan

West Virginia University

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Tasks vs. Roles: A Center Perspective on Data Curation Needs in the Humanities
Muñoz, Trevor, Center for Informatics Research in Science and Scholarship, University of Illinois, Urbana-Champaign, USA, munoz14@illinois.edu
Varvel, Virgil, Center for Informatics Research in Science and Scholarship, University of Illinois, Urbana-Champaign, USA, vvarvel@illinois.edu
Renear, Allen H., Center for Informatics Research in Science and Scholarship, University of Illinois, Urbana-Champaign, USA, renear@illinois.edu
Trainor, Kevin, Center for Informatics Research in Science and Scholarship, University of Illinois, Urbana-Champaign, USA, trainor1@illinois.edu
Dolan, Molly, West Virginia University, USA, molly.dolan@mail.wvu.edu
Abstract
To support the development of curricular content for the Data Curation Education Program (DCEP) at the Graduate School of Library and Information Science (GSLIS), University of Illinois at Urbana Champaign, a needs analysis case study focusing on digital humanities centers was carried out in late 2010. Collectively the results paint an interesting picture of the perception of humanities data curation needs by directors and key staff. Several results were contrary to what we anticipated; for instance, there was only modest agreement on critical areas of expertise needed to sustain meaningful access to digital humanities scholarship over time. Most importantly, there was one result that, if confirmed, could have a substantial impact on the design of data curation education programs. In the humanities, center directors and managers appear to resist the notion that a particular staff role, that of data curator, is specifically needed, preferring instead to develop distributed expertise and responsibilities as part of existing staff roles, calling on institutional resources as needed. This suggests, among other things, that the standing recommendation to place data curation professionals “upstream” in projects may need to be re-envisioned in this context.

Introduction
Data curation has been described as “the active and ongoing management of data throughout its entire lifecycle of interest and usefulness to scholarship” (Cragin et al., 2007). Curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time. Originally conceptualized as an e-Science problem precipitated by large amounts of data in digital formats, data curation is an emerging problem for the humanities as well, as both data and analytical practices become increasingly digital (Renear et al., 2009; Crane, Babeu, & Bamman, 2007).

GSLIS received a grant from the Institute of Museum and Library Services (IMLS) to extend the existing data curation specialization within the school's ALA-accredited master's program to include humanities data as well as science data (Renear et al., 2009). Among the activities carried out under the grant was a study of data management and curation practices in the digital humanities. The study was designed and directed by social science researchers from the Center for Informatics Research in Science and Scholarship (CIRSS). The main goals were to assess the levels and types of data curation expertise needed by researchers actively engaged in digital humanities projects and to better understand the potential roles that information professionals trained to meet the unique challenges of working with humanities data in digital formats might fulfill.

Methods
To develop a rich picture of data curation practices in the humanities, we employed a case study method, taking digital humanities centers as our case. We chose to focus on established digital humanities centers in preference to libraries, repositories, or individual research teams or scholars for a number of reasons. First, many significant early projects were likely to be located at or affiliated with centers—meaning these centers have experience handling data over longer time scales (Daigle, 2005). Second, centers bring together faculty and staff, and we believe this makes them sites where the most sophisticated curation practices are likely to be found (Zorich, 2008).

We interviewed directors and upper-level staff members at 14 digital humanities centers located in the United States, the United Kingdom, Europe, and Australia with one interview per site. High profile, established centers were chosen that were available and willing to participate in the research. Most centers chosen were located at large research universities but the size of the centers ranged from several staff members to large units. Our intuition was that researchers working at the level of a single project might see data curation as something inextricably tied to their specific job and thus not be able to envision it as a stand-alone function; therefore, upper-level staff with responsibilities for hiring and coordination between projects were chosen in order to elicit views of the curator position from that overarching perspective. Participants were asked about a range of topics related to data management including formats and standards, data storage, security and redundancy, staff roles and background, and significant unsolved problems.

Study participants completed a pre-interview worksheet and their responses were used to guide and focus semi-structured interviews. The pre-interview worksheet also included a series of questions asking participants to rank various categories of skills on a Likert scale ranging from “very important” to “not important at all” for curating humanities data. From the ranked list of skills we were able to develop an overview of researchers' views of data curation in the context of multi-project digital humanities centers. From the interviews, we were able to capture more in-depth information such as complex discussions of tradeoffs and rationales that could not be adequately represented in a simple survey. We are therefore able to report both quantitative and qualitative results from our case study.

Results
Variability and Convergence of Skills
When asked to rank the importance of various areas of expertise needed for a data management professional to be effective working with humanities data, participants revealed an unexpectedly high degree of variability in their answers.

From among a list of 30 kinds of expertise provided in the pre-interview worksheet, at least one study participant gave each category of knowledge the highest score, indicating it was “very important.” While we can rank order the areas of expertise according to an average score, the differences between rankings are not statistically significant by Chi-squared analysis. However, we observe that a handful of skills were both highly ranked on average and showed higher positive skew: every respondent ranked expertise in areas such as interoperability, markup, database design, and metadata as being of moderate importance or higher. Project management also had a high rank order. One surprising result among our findings was the strong emphasis on skills related to teaching and training. This may be due to staff at digital humanities centers being tasked as consultants to scholarly projects or it may simply be due to the expectation within the community that skills will be developed through peer-to-peer training in the course of carrying out job duties. Overall, our results coupled with interview data could not identify a consensus as to the most relevant areas of expertise needed by staff engaged in humanities data curation.

Organizational and Management Trends
Our interviews reveal a picture of current practice in which the work of data management and curation at digital humanities centers is parceled out among multiple staff members at multiple levels in the organizational hierarchy. Important data curation tasks may be left for scholars or managers of projects to decide individually, or they may be handled by staff, who work on multiple projects for a center, or they may be outsourced to other campus units above the level of the center.

The staff who did have responsibility for data management and curation at the centers we studied were often either those with programming, systems administration, or other IT training, or were people who had received advanced training in a humanities discipline and had taught themselves technical skills.

In keeping with the emphasis on interoperability noted in our quantitative results and also perhaps in response to a changing funding environment and newly available services, we observed a trend in which efforts were being made to move data management expertise from the staff member who had developed it in the course of his or her duties to a center-wide or perhaps institution-wide level where it would be a part of documentation and institutional memory rather than personal memory.

However, our interviews with managers also suggest that even though data management and curation is beginning to be elevated to a higher position in organizations, there is skepticism about the potential role for a data curator at digital humanities centers. Participants in our study were interested in adding skills relevant to data curation to their organizations but rather than doing so in the form of a dedicated position for an information professional, they appeared to be looking for staff with computing or disciplinary skills who also had some training in data curation.

This finding is consistent with another trend we observed. Just as digital humanities centers are already using outside groups such as campus IT or vendors for certain aspects of data management, we noted an increasing orientation to and interest in working with campus-wide services such as institutional repositories to curate humanities data.

Discussion
Since effective curation, management, and preservation of data in digital formats involves intervention at every stage of the data lifecycle from creation onwards, it has been a common belief in the data curation community that information professionals trained in curation will need to work “upstream” in scientific labs and digital humanities centers (Swan & Brown, 2008). The current resistance of directors of humanities data centers to such dedicated data curation staff must be taken seriously as it undoubtedly reflects relevant experience and judgment, and their sense of the sorts of arrangements that are likely to succeed. However, our case study in combination with prior work on the role of information work in scientific research suggests that models of provisioning data curation expertise may need to be more nuanced.

As the humanities become increasingly “data-rich,” information science research on data management in the natural sciences becomes increasingly relevant (Choudhury & Stinson, 2007; Renear, Muñoz, & Trainor, 2010). For example, intensive case studies in neuroscience suggest that information services for researchers are likely to be most effective at project stages when information work is most routine or when it is highly speculative, as is often the case with new interdisciplinary research questions or in emerging collaborations (Palmer, 2006; Palmer, Cragin, & Hogan 2007). In the humanities we have also seen that conceptions of what constitutes information or support work and what constitutes professional work within disciplines change in response to the introduction of digital methodologies (Flanders, 2005; Bradley, 2008; McCarty, 2009). While the distribution of curation activities may not follow the same types of (re)arrangements we are seeing in the sciences, we still believe that some data curation work will be most effective upstream and integrated into scholars' research endeavors, such as at decision points about project planning and re-use value.

As digital curation practices evolve, libraries and institutional repositories will likely take on a larger role in curating humanities data in the future. The results of our study can serve as a useful point of comparison for future work in this area. In addition to having someone who owns data curation problems and manages solutions on a research-center-level, institutions may explore both how to provide services from a central organization (such as the university library) and also, ways to increase the formal, in-service training available to researchers in the digital humanities.

Acknowledgments
This work was funded by a grant from the Institute of Museum and Library Services (RE-05-08-0062-08). We have benefited from the expertise of Melissa Cragin, Carole Palmer, and other staff from the Center for Informatics Research in Science and Scholarship in designing and carrying out this research.

References:
Bradley, J. 2008 “What the Developer Saw: An Outsider’s View of Annotation, Interpretation and Scholarship, ” Digital Studies / Le champ numérique, 1(1) (link)

Choudhury, G.S., & Stinson, T. 2007 “The virtual observatory and the Roman de la rose: Unexpected relationships and the collaborative imperative, ” Academic Commons, (link)

Cragin, M. H., Heidorn, P. B., Palmer, C. L., & Smith, L. C. 2007 “An Educational Program on Data Curation, ” American Library Association Conference, Science and Technology Section, Washington, D.C., June 25, 2007 (link)

Cragin, M. H., Palmer, C. L., Varvel, V., Collie, A., & Dolan, M. 2009 “Analyzing Data Curation Job Descriptions, ” 5th International Digital Curation Conference, London, U.K., December 2-4, 2009 (link)

Crane, G., Babeu, A. & Bamman, D. 2007 “eScience and the Humanities, ” International Journal on Digital Libraries, 7 117-122

Daigle, B. J. 2005 “How Do We Sustain Digital Scholarship?, ” Free Culture and the Digital Library Symposium Proceedings, Martin Halbert Metascholar Initiative, Atlanta, GA, October 14, 2005

Flanders, J. 2005 “Detailism, Digital Texts, and the Problem of Pedantry, ” Text Technology, 14(2) 41-70

McCarty, W. 2009 “Literary enquiry and experimental method: What has happened? What might?, ” Storia della Scienza e Linguistica Computazionale: Sconfinamenti Possibili, Liborio Dibattista Francoangeli Milan 32-54

Nowviskie, B., & Porter, D. 2010 “The Graceful Degradation Survey: Managing Digital Humanities Projects Through Times of Transition and Decline, ” Digital Humanities, London, U.K., July 7-10, 2010 (link)

Palmer, C. L. 2006 “Weak Information Work and ‘Doable’ Problems in Interdisciplinary Science, ” Proceedings of the American Society for Information Science and Technology, 43(1) 1-16

Palmer, C. L., Cragin, M. H., & Hogan, T. P. 2007 “Weak information work in scientific discovery, ” Information Processing and Management, 43 808-820

Palmer, C. L., Renear, A. H., & Cragin, M. H. 2008 “Purposeful Curation: Research and Education for a Future with Working Data, ” 4th International Digital Curation Conference, Edinburgh, Scotland, December 1-3, 2008 (link)

Renear, A. H., Dolan, M., Trainor, K., & Muñoz, T. 2010 “Extending an LIS Data Curation Curriculum to the Humanities: Selected Activities and Observations, ” iSchools Conference, Champaign-Urbana, IL, February 3-6, 2010 (link)

Renear, A. H., Muñoz, T., & Trainor, K. 2010 “Data Curation Education for the Humanities: Principles & Challenges, ” 5th Annual Chicago Colloquium on Digital Humanities and Computer Science, Northwestern University, Evanston, IL, November 21-22, 2010 (link)

Renear, A. H., Teffeau, L. C., Hswe, P., Dolan, M., Palmer, C. L., Cragin, M. H., & Unsworth, J. 2009 “Extending an LIS Data Curation Curriculum to Include Humanities Data, ” DigCCurr Conference, Chapel Hill, N.C., April 1-3, 2009 (link)

Smith, A. 2003 New-Model Scholarship: How Will It Survive?, Council on Library and Information Resources Washington, D.C. (link)

Swan, A. & Brown, S. 2008 The Skills, Role and Career Structure of Data Scientists and Curators: An Assessment of Current Practice and Future Needs, JISC (link)

Zorich, D. 2008 A Survey of Digital Humanities Centers in the United States, Council on Library and Information Resources Washington, D.C. (link)

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2011

"Big Tent Digital Humanities"

Hosted at Stanford University

Stanford, California, United States

June 19, 2011 - June 22, 2011

151 works by 361 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: https://dh2011.stanford.edu/

Series: ADHO (6)

Organizers: ADHO

Tasks vs. Roles: A Center Perspective on Data Curation Needs in the Humanities

1. Trevor Muñoz

2. Virgil Varvel

3. Allen H. Renear

4. Kevin Trainor

5. Molly Dolan

ADHO - 2011

"Big Tent Digital Humanities"