Rice University
Rice University
Among the most popular platforms for digital humanities development projects is GitHub.
This work emerges out of Rice University’s John E. Sawyer Seminar on Platforms of Knowledge in Wide Web of Worlds, supported by the Andrew W. Mellon Foundation.
GitHub provides web-based hosting for coding and other collaborative projects, building on the Git version control system. Founded in 2008, GitHub now hosts approximately 31 million repositories and 12 million users, making it “
the largest online storage space of collaborative works that exists in the world” (GitHub, 2016; Orsini, 2013). Using GitHub, developers can fork (copy) a public repository to their own account, change the code, and submit a pull request to share the modifications with the repository owner, who can then “merge” the code with the original. GitHub also enables users to create profiles, “follow” others, “star” projects and “watch” them evolve, serving as a social network for developers (Brown, 2014).
Digital humanities (DH) researchers are drawn by GitHub’s support for version control and collaboration, as well as by its free hosting for publicly available projects. A range of digital humanities projects use GitHub, including:
Software
Writing projects
Taxonomies and community documentation
Websites
Datasets
Course materials
Syllabi
Research notes
What digital humanities researchers are adopting GitHub, and why? What are the benefits and risks of employing GitHub for digital humanities work?
Academia’s growing reliance on GitHub requires careful consideration. As a for-profit company, GitHub does not necessarily operate with the interests of academics at heart. Yet it provides services that would be difficult for scholars to secure themselves, and it enables collaboration. To develop an informed view of GitHub and similar services, we need to establish clear criteria for evaluating platforms. In choosing a platform for a web project, Quinn Dombrowski recommends considering functionality, familiarity, community, support and cost (Dombrowski, 2013). We would add support for openness and sustainability as core criteria for digital humanities platforms.
Through an initial case study of GitHub, we will examine these criteria for evaluating DH platforms:
Functionality: GitHub offers several features that make it attractive to researchers, particularly open science advocates. Karthik Ram suggests that Git (and by extension GitHub) supports open science by providing decentralized version control; attributing changes to authors; supporting distributed backup of data; enabling projects to branch in new directions; collecting feedback through issue trackers; and facilitating reuse through forking (Ram, 2013). Likewise, Konrad Lawson touts the power of GitHub in facilitating “collaboration without collaboration” (easily modifying someone else’s code through forking, and contributing that code back through a pull request) and detailed credit for contributions (Lawson, 2013c; Lawson, 2013a). While GitHub can be used for a range of texts, from syllabi to code, it’s not necessarily well suited for all uses. For example, Mark Sample notes the significant labor and potentially low rewards in putting syllabi into GitHub (Sample, 2012a). Lawson observes that using GitHub for writing projects requires overcoming a fairly steep learning curve, using plain text (or a converter), creating short documents, and dealing with limited support for non-textual files (Lawson, 2013d).
Familiarity/ease of use: As Lincoln Mullen points out, GitHub’s learning curve poses a barrier to entry for some potential collaborators (Mullen, 2012). Indeed, participants discussing how to make it easier for women to contribute to Programming Historian identified the publication’s reliance on GitHub as an obstacle (Crymble et al, 2015). To what extent do the challenges of using GitHub limit its adoption in the humanities?
Community: As more digital humanists adopt GitHub, it becomes even more attractive, since you are more likely to find collaborators and to gain recognition for your work. Yet the wide adoption of GitHub may reduce diversity and increase dependency on a commercial platform. Not everyone wants to participate in this community. Lawson points to several social and cultural obstacles to GitHub enabling richer academic collaboration, including reluctance to embrace “forking” as means of building on another’s work; fears of plagiarism; concerns that the original voice of the author will be lost; anxiety that transparency will reveal one’s scholarly flaws; and worry that ideas will be stolen or misused (Lawson, 2013b).
Support: With such a large community, new users can turn to a number of resources to learn how to use GitHub. Some university IT groups offer limited support for GitHub, but in general users are left to secure their own support.
Cost/business model: GitHub uses a “Freemium” business model in which it hosts public repositories for free and charges companies for private repositories (Brown, 2014). It also offers up to five free private repositories to academic researchers and twenty to research groups. While free holds appeal, should the digital humanities community be concerned about becoming dependent on a platform developed by a for-profit company? As Sample warns, “History suggests that relying too much on a commercial service with interests that do not necessarily align with our own is no way to sustain the work of the humanities”(Sample, 2012b). GitHub has attracted $350 million in venture capital and is now valued at about $2 billion, so it faces pressure to generate a profit (Gage, 2015). Klint Finley argues that GitHub’s business interests may work against its open source mission, pointing to SourceForge as an example of an open source software site that went astray (Finley, 2015). When SourceForge was acquired, it began to display junky third-party ads that misled people into downloading malicious software, prompting projects such as GIMP and VLC to leave. While GitHub is not funded through ad revenue, its business model could change under pressure from investors.
Support for openness: Compared to some web platforms that claim user-produced content as their own, GitHub articulates an open approach to intellectual property: “Your profile and materials uploaded remain yours”(GitHub, 2015). But should we be concerned about clauses reserving the right to remove content and requiring users to defend and indemnify GitHub against suits alleging that their content violates the law? Does GitHub’s model of providing free public repositories lead some users to share work that they otherwise would keep private?
Sustainability: GitHub is not meant to be a preservation repository, and it is easy to delete a public repository (Bergman, 2012). However, Git’s distributed, decentralized approach to versioning provides protection against data loss, since everyone who contributes to a GitHub project has a local copy of the code (Finley, 2015).
In addition to these criteria, we will also consider the significance of factors such as accessibility and multilingualism.
In performing this research, we are first identifying digital humanities users by 1) searching for publicly available GitHub accounts associated with presenters at the last three Digital Humanities conferences and 2) searching for GitHub accounts associated with Digital Humanities centers listed on CenterNet. To understand patterns of collaboration and code reuse, we will analyze publicly available statistics for selected users such as number of commits, branches, releases and contributors, as well as networks connecting users. We will survey GitHub DH users to understand how and why they use GitHub, its strengths, and its weaknesses. We will also conduct interviews with selected GitHub users. Where possible, we will use GitHub to share ongoing work about this project.
https://github.com/lms4w/githubproject
By analyzing public GitHub statistics and gathering insights and information from users, we will illustrate how GitHub is being used in the digital humanities community and develop principles for evaluating platforms.
Bibliography
Bergman, C., (2012). On the Preservation of Published Bioinformatics Code on Github.
An Assembly of Fragments. Available at: https://caseybergman.wordpress.com/2012/11/08/on-the-preservation-of-published-bioinformatics-code-on-github/ (accessed 1 November 2015).
Brown, M. (2014). GitHub - Cracking the Code to GitHub’s Growth.
GrowthHackers. Available at: https://growthhackers.com/growth-studies/github (accessed 13 October 2015).
Crymble, A., Posner, M., et al. (2015). How Can We Make The PH More Friendly For Women To Contribute? Issue #152.
Programming Historian. Available at:
https://github.com/programminghistorian/jekyll/issues/152 (accessed 12 February 2016).
Dombrowski, Q. (2013). Choosing a platform for your project website.
Berkeley Digital Humanities. Available at: http://digitalhumanities.berkeley.edu/blog/13/12/04/choosing-platform-your-project-website (accessed 14 October 2015).
Finley, K. (2015). The Problem With Putting All the World’s Code in GitHub.
WIRED. Available at: http://www.wired.com/2015/06/problem-putting-worlds-code-github/ (accessed 13 October 2015).
Gage, D. (2015). GitHub Raises $250 Million at $2 Billion Valuation; Capital Raise Puts Company’s Total Funding at $350 Million.
Wall Street Journal (Online), 29 July.
GitHub. (2015). Github Terms Of Service - User Documentation.
Help.github.com. Available from:
https://help.github.com/articles/github-terms-of-service/ (accessed 28 October 2015).
GitHub. (2016). Press. Available at: https://github.com/about/press (accessed 5 March 2016).
Lawson, K. M. (2013a). File and Repository History in GitHub.
The Chronicle of Higher Education Blogs: ProfHacker. Available at: http://chronicle.com/blogs/profhacker/file-and-repository-history-in-github/48047 (accessed 30 October 2015).
Lawson, K. M. (2013b). Fork the Academy.
The Chronicle of Higher Education Blogs: ProfHacker. Available at: http://chronicle.com/blogs/profhacker/fork-the-academy/48935 (accessed 30 October 2015).
Lawson, K.M. (2013c). Getting Started With a GitHub Repository.
The Chronicle of Higher Education Blogs: ProfHacker. Available at: http://chronicle.com/blogs/profhacker/getting-started-with-a-github-repository/47393 (accessed 30 October 2015).
Lawson, K.M. (2013d). The Limitations of GitHub for Writers.
The Chronicle of Higher Education Blogs: ProfHacker. Available at: http://chronicle.com/blogs/profhacker/the-limitations-of-github-for-writers/48299 (accessed 30 October 2015).
Mullen, L. (2012). How ready are DHers to use GitHub for non-code projects?
Digital Humanities Questions and Answers. Available at: http://digitalhumanities.org/answers/topic/how-ready-are-dhers-to-use-github-for-non-code-projects (accessed 30 October 2015).
Orsini, L. (2013). GitHub For Beginners: Don’t Get Scared, Get Started.
ReadWrite. Available at: http://readwrite.com/2013/09/30/understanding-github-a-journey-for-beginners-part-1 (accessed 13 October 2015).
Ram, K. (2013). Git can facilitate greater reproducibility and increased transparency in science.
Source Code for Biology and Medicine, 8(1): 7.
Sample, M. (2012a). Git a Fork in My Syllabus, It’s Done.
The Chronicle of Higher Education Blogs: ProfHacker. Available at: http://chronicle.com/blogs/profhacker/git-a-fork-in-my-syllabus-its-done/40331 (accessed 30 October 2015).
Sample, M. (2012b). GitHub Fever.
Digital Culture Week, 1(3). Available at: http://www.digitalculture.org/2012/06/08/dcw-volume-1-issue-3-distant-and-familiar/ (accessed 31 October 2015).
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Jagiellonian University, Pedagogical University of Krakow
Kraków, Poland
July 11, 2016 - July 16, 2016
454 works by 1072 authors indexed
Conference website: https://dh2016.adho.org/
Series: ADHO (11)
Organizers: ADHO