New York University
Abstract
The modeling of mass quantities of cultural objects has led the humanities in new and sometimes discomforting directions. Digital humanists have come to realize the stakes of such practices: emerging paths of scholarship that supplement but may also fundamentally alter the research methodologies and outputs of the humanities. Matthew Wilkens writes that the necessary "decay" of critical techniques that pay close attention to those objects is "a negative in itself only if we mistakenly equate literary and cultural analysis with their current working method" (Wilkens, 2012). One question that remains is if the quantitative working method of such large analyses is compatible with that "current working method" - in this case, the individual interpretation and critique of texts. In the years since Franco Moretti's "distant reading" paradigm became a commonplace, scholars have tested this useful if problematic dichotomy of "close" and "distant" reading. In 2013, for instance, Andrew Piper writes of the "topology" resultant from the dispersive techniques of programmatic text analysis. Setting the lexical components of texts in relation to one another, Piper envisions a tactic of "focalization" that can allow "readers" of such deconstructed texts to understand the relationships between those components (or characteristics) at multiple scales (Piper, 2013: 375). And yet there is always a "between" those multiple scales: "for every unity there can always be something between it and that which it succeeds" (381). Subsequently, there also exist important relationships between these scales of information that inform one another. When macroanalysis brings us to a point where we must return our close attention to our objects of study, we need be reminded of the model that brought us to that perspective in the first place. In focalizing, we are best served to maintain multiple foci. It is a natural tendency to want to confirm macroanalytic results by reading texts and by paying attention to the details of our mathematical models. But how might we do so while keeping the complex relationships of those models in mind?
Figure 1. “Topic Words in Context”: an in-browser tool for exploring the scales of data in a topic model
When digital humanists use topic models to explore large corpora of texts, they do so at an inherent disadvantage. Typically presented with flat files listing topics and topic weights, they are left to interpret these lists and figures separate from the texts that have just been modeled. Several significant tools have been developed to help scholars visually navigate the textual relationships in topic models. However, in the past few years I have been working on a practical, critical methodology for understanding topic models and the relations between their outputs and that "current working method" of the humanities: human-guided close and contextual reading. For this talk, I will take attendees on a visual exploration of a topic model of Emily Dickinson's poetry using a highly interactive and playful data visualization I developed for my Master's thesis, called "Topic Words in Context" - or "TWiC", for short. TWiC is a multi-paneled environment for web browsers that allows users to explore and juxtapose multiple scales of data in topic models of digital corpora. It uses shapes, colors, and cross-panel highlighting to get users of topic models from "big" data to "small" and back. Importantly, it also provides an alternate "publication" view that resituates modeled texts back into their original publication contexts (e.g. texts split for modeling purposes or texts within a collection). Recalling Piper's topological concepts, TWiC brings our focus simultaneously to these many textual and statistical relationships at play within a topic model. From corpus-wide topic distributions to texts to the topics themselves, each scale of the model when set against each other can reveal hierarchical qualities that enrich and move beyond the semantic/linguistic relationships frequently associated with the word lists of topic models. (Documentation and color screenshots of TWiC are available at in the README.md file at github.com/jarmoza/twic.) Of the many analytical techniques TWiC makes possible, I will demonstrate how we can produce expressive, critical comparisons between our close readings of texts and the smallest of quantitative scales in a topic model: individual texts and individual topics. We will look at different weighting schemes for topic and topic word distributions, how to quantitatively characterize and visualize them, and then how to compare them to traditional literary criticism. As it turns out, the expressiveness of a topic model functions differently depending on the context in which we depict its data. To show this I will turn our attention to the literary-bibliographic focus of my Master's thesis, Emily Dickinson's "fascicle" books of poetry.
Figure 2. The topics of several Dickinson poems, displayed proportionately and in the order of a fascicle
Emily Dickinson died in 1886, leaving behind in a small, wooden box an unpublished trove of poetry numbering near 2,000 individual works. Many of those poems were bound, hand-sewn into tiny books that have come to be known as her "fascicles." While her poems were being prepared to be published, a family feud arose that split the collection of her manuscripts, and also resulted in the fascicle ordering of her poems being lost for years. A long-studied, now-canonical poet, Dickinson is considered a proto-modernist, someone whose style influenced many of the American poets of the early twentieth century. However, given the size of the task, a comprehensive assessment of every piece of her writing has rarely been attempted. It would not be until the midtwentieth century that the painstaking effort to rediscover those original orderings was made by bibliographers, notably R.W. Franklin. With that work completed, Dickinson scholars like Eleanor Elson
Heginbotham, Sharon Cameron, and others have provided assessments of a select few of these orderings using all of the critical tools at their disposal honed by years of reading Dickinson: interpretations that pay attention to style, context, biography, history, textual studies, and more. Even so it just may not be humanly possible to provide a comprehensive perspective of her writing via such individual attentiveness. Dickinson's poems and their bibliographic history therefore present a fortuitous and somewhat unique set of circumstances for digital humanists. Her oeuvre as a poet is large enough in size to be mathematically modeled. There is a known ordering to much of her works. And her words are "truth told slant" enough in their polysemy to problematize a topic model's expectations of the relationships between them - even at the level of the individual line, let alone across several works.
Figure 3. A Dickinson poem viewed in a composite weighting scheme that utilizes both topic model weights and the considerations of a literary critic
While digital humanists still want to closely consider our objects of study away from computation, we also want to consider them from these new perspectives that digital modeling methods provide. We tend to bounce between considerations of model-induced order and the contextualizing work done by human-imposed orderings. This talk will provide a case study of how those differing worlds of order relate, how they sometimes either complement or contrast with one another. By combining the outputs of a topic model with the context of Dickinson's fascicle orderings, for instance, one can make quick comparisons between the qualitative discursive relationships of her poems and the quantitative relationships established by a topic model of them. I will introduce TWiC and its publication view to
attendees as a means of exploring topic models, Humanities. http://dhdebates.gc.cuny.edu/de-
showing how it can be used to facilitate close readings bates/text/17.
of texts that focus on model data as well as on prior
humanities criticism of those texts. This exploration
will take us from the patterned, colorful shapes of
TWiC to several types of analytic visuals that unweave
the probabilistic threads of a topic model. By the
conclusion, we will be able to proportionally compare
the interpretations of a Dickinson critic and a topic
model of Dickinson's poems as they are situated in the
fascicle books.
Bibliography
Armoza, J. (2014). “Topic Words in Context.” https://github.com/jarmoza/twic.
Armoza, J. (2016). “Topic Words in Context - Dickinson, the Fascicle and the Topic Model.” https://github.com/jarmoza/masters_thesis.
Cameron, S. (1992). Choosing Not Choosing: Dickinson's Fascicles. Chicago, IL: University of Chicago Press.
Dickinson, E. (2013). “Emily Dickinson Archive.” http://www.edickinson.org.
Dickinson, E. (1981). R.W. Franklin (ed), The Manuscript Books of Emily Dickinson. Cambridge, MA: Belknap
Press.
Franklin, R.W. (1967). The Editing of Emily Dickinson: A Reconsideration. Madison, WI: University of Wisconsin Press.
Heginbotham, E. (2003). Reading the Fascicles of Emily Dickinson: Dwelling in Possibilities. Columbus, OH: Ohio State University Press.
Jockers, M. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press.
McCallum, A. (2002). MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.edu.
Mimno, D. (2015). MALLET: A Machine Learning for Language Toolkit. https://github.com/mimno/Mallet.
Moretti, F. (2007). Graphs, Maps, Trees: Abstract Models for Literary History. Verso.
Piper, A. (2013). "Reading's Refrain: From Bibliography to Topology." ELH. 80(2): 373-399.
Wilkens, M. (2012). "Canons, Close Reading, and the Evolution of Method." In Gold, M. (ed), Debates in the Digital
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at McGill University, Université de Montréal
Montréal, Canada
Aug. 8, 2017 - Aug. 11, 2017
438 works by 962 authors indexed
Conference website: https://dh2017.adho.org/
References: http://web.archive.org/web/20170802132745/https://www.conftool.pro/dh2017/sessions.php
Series: ADHO (12)
Organizers: ADHO