McGill University
University of Alberta
Context
Voyant Tools (voyant-tools.org) is a web-based reading and analysis environment for digital texts. Users can create their own corpus of texts to work with by pointing to URLs or uploading files in a variety of formats (plain text, XML, HTML, PDF, MS Word, RTF, etc.). Voyant allows users to navigate between macro views of the corpus (e.g. a word cloud visualization of the entire corpus) and micro views (e.g. a reading individual occurrences of a specific term in context). The default interface provides access to eight tools for reading texts and studying frequency and distribution data, and many more tools are available in various pre-defined or user-defined "skins" (a layout of tools that interact).
Using the hosted version of Voyant has several advantages. For instance, Voyant can be accessed immediately through most modern browsers and no software installation or configuration is required (some of the tools need plugins for Flash or Java, but not the ones in the default skin). Similarly, users are always accessing the most recent stable version of the web application, no actions are required by the user to update the software (typically, new builds with bug fixes and new functionality are released several times a month). A third benefit relates to Voyant's design as a networked tool where users can generate persistent URLs for their corpus and tools that can be stored, shared and embedded in remote sites.
The accessibility and convenience of the hosted version of Voyant also has disadvantages, however. Users often tell us they are wary of uploading texts to a server out of concern for possible copyright infringement or privacy issues. Consistent with its overall design priority of keeping things simple, Voyant does not currently have any user or access management system, so technically any uploaded content could be accessed by anyone (in practice, each uploaded corpus has a generated ID that it would be extremely unlikely to guess – so corpora are as hidden as users choose them to be). Still, with some material (protected responses to a survey that has required ethics approval, for instance), uploading content to a public, non-secure (non-SSL) site is not an acceptable option. A second potential source of frustration with the hosted version of Voyant is performance, especially with larger-scale text collections. The hosted server is configured to timeout after about 30 seconds, which can be insufficient when uploading, pre-processing and indexing a large text collection (the issue in some cases is not so much the scalability of Voyant as an application, but the constraints of a web application configured to support multiple concurrent users). Performance can also be an issue when running a workshop or tutorial with Voyant: 30 users hitting the same button at once can overwhelm the server (the current hosted deployment of Voyant runs in a ComputeCanada High Performance Computing facility but does not run as a cloud service that could smoothly expand capacity as on-demand). A final disadvantage with the hosted version of Voyant is that it requires reliable internet connectivity, which may be a mere nuisance at 30,000 feet, but a true problem at, say, a conference workshop with slow or unstable wireless. Performance, privacy, connectivity: three reasons among others to explore alternatives to using the hosted version of Voyant.
Workshop Outline
1) What is Voyant? (1 hour)
We will begin with a general introduction to Voyant within the broader landscape of digital text analysis. We will provide context and resources for working with digital texts. We will provide a brief overview of Voyant's user interface and discussion its strengths and weaknesses. We will suggest some other tools and techniques that may be of interest to workshop participants. This component of the workshop is like a mini-workshop on Voyant that will ensure a common base of familiarity with text analysis in general and Voyant in particular.
2) My Voyant (1 hour)
We will summarize some of the pros and cons of working with the hosted version of Voyant compared to running a local, stand-alone version. We will describe how to acquire a version of Voyant that can be run locally and then guide participants through running Voyant as a simple click-and-run desktop-style application. We will point out some of the most important technical aspects to be aware of, including how to understand the versioning system, where to find locally stored corpora, and how to respond to common errors.
3) Tweaking My Voyant (1 hour)
The nature of a web-application will be deliberately underemphasized in the previous component in favour of presenting My Voyant as a simple desktop-style application. This third component of the workshop will revisit the nature of My Voyant and explore ways that the web application can be tweaked to improve performance (memory settings in Java, server timeout settings, etc.) or otherwise fine-tuned (such as changing the default server port). The content will still be oriented around what's most relevant for a typical user of Voyant (not, say, a unix system administration guru). Finally, we will describe some strategies for deploying multiple instances of My Voyant for teaching/training purposes.
In most cases the hosted version of Voyant is probably still best in terms of convenience and sharing of work, but there are times where a local instance of Voyant may be preferable, especially with respect to performance, privacy and off-line access. This workshop, focused on using (not managing) My Voyant locally, will serve to expand the possibilities of doing digital text analysis.
Workshop Leaders:
Stéfan Sinclair, sgsinclair@gmail.com, is an Associate Professor in Digital Humanities at McGill University. His research focuses primarily on the design, development and theorization of tools for the digital humanities, especially for text analysis and visualization. He has led or contributed significantly to projects such as Voyant Tools, the Text Analysis Portal for Research (TAPoR), and BonPatron. Other professional activities include serving as associate editor of Digital Humanities Quarterly, as well as serving on the executive boards of CSDH/SCHN, ACH, ADHO, and centerNET.
Geoffrey Rockwell, grockwel@ualberta.ca, is a Professor of Philosophy and Humanities Computing at the University of Alberta, Canada. He has published and presented papers in the area of philosophical dialogue, textual visualization and analysis, humanities computing, instructional technology, computer games and multimedia. He was the project leader for the CFI (Canada Foundation for Innovation) funded project TAPoR, a Text Analysis Portal for Research, which has developed a text tool portal for researchers who work with electronic texts. He is the author of "Defining Dialogue: From Socrates to the Internet with Humanity Books."
Target Audience: a wide range of DH practitioners interested in text analysis, particularly for research, teaching, or technical support. Voyant Tools workshops are typically fully-subscribed; we prefer to limit registration to about 30 people to allow us to help participants as needed.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne
Lausanne, Switzerland
July 7, 2014 - July 12, 2014
377 works by 898 authors indexed
XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)
Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/
Attendance: 750 delegates according to Nyhan 2016
Series: ADHO (9)
Organizers: ADHO