Digital approaches to understanding the geographies in literary and historical texts

paper, specified "long paper"
Authorship
  1. 1. Ian Gregory

    University of Lancaster

  2. 2. Chris Donaldson

    University of Lancaster

  3. 3. Patricia Murrieta-Flores

    University of Lancaster

  4. 4. C.J. Rupp

    University of Lancaster

  5. 5. Alistair Baron

    University of Lancaster

  6. 6. Andrew Hardie

    University of Lancaster

  7. 7. Paul Rayson

    University of Lancaster

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper reports on recent research that explores how geographical information systems (GIS) and related technologies can be used to understand texts, drawing on both literary and historical examples. In Graphs, Maps, Trees, F. Moretti identifies mapping as one tool that facilitates distant reading. Other researchers have subsequently demonstrated that GIS can be used to implement this.The research presented here illustrates that the potential for GIS and related technologies in the humanities goes beyond both mapping and distant reading. Specifically, we identify three general ways that geographical technologies can enrich our understanding of texts: first, distant reading using Geographical Text Analysis (GTA), a combination of techniques from GIS-based spatial analysis and from corpus linguistics; second, enhanced close reading based on using place names or maps as query tools; and third, geographical analyses of the texts using techniques such as network analysis and route reconstruction. The aim of these three approaches is to go beyond simply producing visualisations, and instead to allow us to improve our understanding of the text with an emphasis on its geographies.

Fig. 1: Cholera in the Registrar General’s reports showing (a) locations of cholera instances and (b) instances of terms associated with the water supply.

Fig. 2: Time series of cholera instances from the Registrar General’s reports. The 1868 spike in instances was not matched by a corresponding rise in deaths.
Distant reading through GTA effectively allows us to ask two basic questions: what places is the corpus talking about? and what places does the corpus relate to a particular theme? This involves more than simple mapping. First, place-names have to be identified using automated techniques. Once this has been done spatial analysis and corpus linguistics techniques allow the geographies within the text to be investigated either in an exploratory way that asks ‘where is the corpus talking about?’ and ‘what is it saying about these places?’, or in a more thematic way that asks ‘where the corpus is talking about in relation to my theme?’ and ‘what else is being said about these places?’. As all of the place-names are georeferenced, we are also able to integrate them with other sources that are also georeferenced. To illustrate the potential of this we use the Registrar General’s Reports, which document mortality and disease in England and Wales from 1851 to 1911. Using GTA to explore the Registrar General’s reporting of cholera showed a number of interesting things: first, that he was particularly interested in cholera in London (figure 1a); second, that the discourse on cholera in London was strongly associated with potential causes, particularly the water supply (figure 1b), whereas in other parts of the country he tended simply to acknowledge that cholera was occurring, increasing or declining; third, that the emphasis on London could not be justified either by the numbers of deaths from cholera in London or London’s death rate from the disease; and fourth, that whereas early spikes in instances of cholera in the Reports correspond with known cholera epidemics, the last large spike in 1868 (figure 2), was largely associated with the fear of an epidemic spreading to Britain. Given that there were relatively few cholera-related deaths reported in 1868, we have concluded that the improved understanding of the disease had led to improved measures to prevent it.
These types of distant reading techniques can also be applied to literary texts. Using a corpus of writing about the English Lake District we can show that whereas William Wordsworth was associated with a few central parts of the region in the Romantic period, Victorian readers associated him with sites throughout the Lakes. Using digital images from Flickr, furthermore, we can show that this trend has been reversed in the 20th century.
These top-down, automated techniques are valuable because they allow us to understand large corpora quickly, but they do so at the expense of losing much of the subtly and nuance that close reading can offer. It is frequently argued that one of the key advantages of digital texts is that they can be read in a non-linear manner. A weakness of this is that it is not always clear how to structure non-linear reading. Place offers one way in. The decline in mortality, particularly among infants (aged under one), started in the nineteenth century but is poorly understood. Much of the research that has been done focusses on the problems and solutions of large urban centres such as London. This is despite the fact that quantitative evidence shows that some rural areas started to decline far earlier than urban centres and at much faster rates. Despite this, there could be major variations between nearby rural areas with apparently similar quantitative characteristics. To explore this further, three neighbouring districts in rural Suffolk - Sudbury, Samford and Risbridge - were analysed. Sudbury and Samford both had relatively high infant mortality rates in the 1850s, the earliest decade for which data are available, but showed rapid improvements thereafter. Risbridge, by contrast, started with low rates, but only showed slight improvements through the rest of the century. In order to explain these variations we first had to identify all place-names within these districts. This was done using a GIS of the boundaries and a gazetteer. These were then used to query the British Library’s Nineteenth Century Newspaper corpus, which contains text from over two million newspaper pages. Additional search terms thought to be relevant to infant mortality decline were used to narrow the searches and this list was refined as the research progressed. Based on the articles found through these queries, we have concluded that the system of local government in Risbridge was far less effective than the systems in the other two districts. Despite many calls to improve drainage, housing and a range of other features that have well established links to infant deaths, little action was taken by Risbridge’s authorities. This can clearly be contrasted to the situation in the other two districts, where the local authorities took extensive action. Although this is not a definite causal link, it does provide strong evidence that local government played an important role in reducing rural mortality rates, something that has previously only been identified at the national level or for major urban centres.
Again, similar techniques can be used in literature. We demonstrate this using map-based queries rather than place-names. A system was created that uses a Google Map to show every place mentioned in our corpus of Lake District writing as a point. Each point was linked to web-pages that include the full text. Clicking on a point on the map, presents the reader with a keyword-in-context list (or place-name-in-context) list of all of references to that place and hyperlinks can then be used to follow from these to the appropriate location in the full text. This allows the reader to query not only what is being said about a particular place, but also about nearby places.

Fig. 3: Network analysis of Norman Nicholson’s work. The diagram on the right shows the number of letters sent by Nicholson with thicker lines indicating more letters. The map on the right shows where recipients living in Britain lived.

Fig. 4: Cost surface analysis showing the combined estimated routes of Arthur Young (1770), Thomas Gray (1775) and Thomas Pennant (1771 and 1776). Reds and yellows indicate frequented routes.
Finally, geographical technologies can also be used to enhance texts in a number of ways. One way, shown in figure 3, is network analysis which can be used to explore, for example, networks of correspondence. We have used this to explore the correspondence networks of Lake District writers such as Norman Nicholson where a combination of diagrams to show who he was corresponding with and in what volumes, and maps to show where they lived was used. A different approach allows us to move beyond seeing places within texts as isolated points and instead to explore them as parts of journeys. This was done using a number of accounts of journeys through the Lake District. First, the texts were close-read to identify the order in which place-names mentioned were visited. These were mapped as points which were then used as the input into technique called cost-surface analysis which estimates the most likely route between points. This has been shown to be particularly effective in upland areas such as the Lake District (figure 4). This allows us to estimate and map the routes the writers are likely to have taken, and to explore the geographies of silence concerning the places which writers are likely to have visited but have not mentioned.
In conclusion, the use of geographical technologies in understanding texts is potentially multi-faceted and goes far beyond producing maps. It is instead a useful tool for understanding and enhancing texts to produce the abstract summaries required for distant reading, to select parts of the text that require close reading, and to allow new forms of analyse to help understand the geographies within texts.
Acknowledgements

The research leading to these results has received funding from the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC grant “Spatial Humanities: Texts, GIS, places” (agreement number 283850). We are also grateful to Sarah Hastings for her work on Suffolk while on an internship from Mt Holyoke College hosted in the Department of History, Lancaster University.
References

F. Moretti, Graphs, Maps, Trees(London, 2005)
I.N. Gregory and A. Hardie A. “Visual GISting: Bringing together corpus linguistics and Geographical Information Systems”Literary and Linguistic Computing, 26 (2011), 297-314
C. Grover, R. Tobin, K. Byrne, M. Woollard, J. Reid, S. Dunn, and J. Ball“Use of the Edinburgh Geoparser for georeferencing digitized historical collections”Philosophical Transactions of the Royal Society A, 368 (2008), 3875-3889.
These are taken from the Histpop collection, see http://www.histpop.org.
D.J. Cohen and R. Rosenzweig, Digital History (Philadelphia, 2006)
I.N. Gregory“Different places, different stories: Infant mortality decline in England & Wales, 1851-1911”Annals of the Association of American Geographers, 98 (2008), 773-794.
http://www.bl.uk/reshelp/findhelprestype/news/newspdigproj/database/
S. Szreter “The importance of social intervention in Britain’s mortality decline c. 1850-1914: a re-interpretation of the role of public health”Social History of Medicine, 1 (1988), 1-37

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO