The social pleasure of the text: Applying digital humanities methods to reception studies

paper, specified "long paper"
  1. 1. Anouk Lang

    University of Strathclyde

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

How do readers use social media to express the value and the pleasure that the experience of reading holds for them? And, given the rapidity with which corpora gathered from social media are growing, what kinds of methods are most useful for analysing this kind of (big) data so as to cast light on the phenomenology of reading experiences? This paper seeks to answer these questions by presenting the findings of a project on developing methods for analysing and evaluating literary engagement in digital contexts, funded by the Arts and Humanities Research Council under the auspices of the Cultural Value Project.1 It will report on what can be learnt from the large amount of user-generated data available on microblogging services and social network sites about the value that reading brings to the lives of individuals and communities, and will offer an evaluation of the various analytical tools and methods available to scholars working on reading and reception studies who wish to include born-digital data in their research.

Work in reception studies is increasingly focusing on the ways that an understanding of the significance of individual reading experiences can be enriched by attending to occasions when readers join with others to express opinions about a text, and work together to construct its meaning. Scholars have argued that it is in fact in these acts of public negotiation of meaning – for example book group discussions – that readers can be observed doing the private cognitive work of textual engagement, as their interpretations change in the act of articulating their response in a social context.2 The fact that the rich textual data available on social media is often generated by readers in conversation with friends or acquaintances, in contexts quite different to interviews with researchers or questionnaires which might prompt a higher level of self-editing, makes it even more compelling to work with.3 The obvious advantage of working with this sort of born-digital material is that it lends itself to analysis using the growing number of tools and methods being developed within digital humanities, which have the power to integrate textual and geospatial information, and to identify lexical trends in time-stamped data. Such computational methods not only offer scholars the opportunity to analyse much larger bodies of text than is ordinarily possible for individual researchers to examine through close reading, but also to draw on, and discover patterns in, temporal and geospatial metadata.

Data for this project was gathered from two different social media platforms, the microblogging platform Twitter and the book collection website LibraryThing.4 For the Twitter data, searches were performed for literary prizes (for example Man Booker Prize and Nobel), author names (for example [Eleanor] Catton and [Alice] Munro), and hashtags commonly used to signal reading-related tweets (for example #goodreads and #mustread). For the LibraryThing data, the results of the Twitter searches were used to suggest particular books to investigate, so as to enable a comparison of the way readers discussed books on the two platforms. The numerical review scores and the text of user reviews of these books were stored in a database, along with metadata about the user. While some interesting work on literary value has already been done by scraping data from Amazon,5 LibraryThing was selected for this project as it is a platform where readers gather primarily to share information voluntarily about books in ways not (directly) linked to commercial activity. Moreover, it is also possible to link some of this information to users’ reported geographic location, something which cannot be done with Amazon data.

Various digital methods were then applied to the resulting datasets: thematic analysis using methods from corpus linguistics, analysis of trends in word usage over time using a burst detection algorithm, and geospatial analysis.

1) Thematic analysis
Analytical techniques from corpus linguistics were employed to identify patterns of unusually prominent words, phrases and grammatical constructions. The textual data gathered were tagged with the CLAWS part-of-speech tagger,6 and the concordance program AntConc7 was then used to identify the most frequent words, determine their statistical significance as compared to a reference corpus, find the terms that most commonly collocated with them, and carry out other analytical procedures. Sub-corpora were separated out by hashtag and geographical location, and analysed individually.

2) Temporal analysis
As all the Twitter data and a significant proportion of the LibraryThing data is time-stamped, it presented an opportunity to analyse trends over time, something that can be done with burst detection analysis in order to gauge how influential particular words or hashtags have been over time.8 The Sci2 tool9 was used to perform burst detection, and to visualise the results as temporal bar graphs. Terms that “burst” into prominence were then fed back into the corpus linguistic analysis, for example in order to examine the collocation patterns around them, and to attend to the context in which they initially appeared.

3) Geospatial analysis
The software package ArcGIS was used to create a GIS database including layers derived from the Twitter and LibraryThing data, to see where particular geographical patternings in the search terms and hashtags occurred. (While not all tweets or contributions to LibraryThing have georeferences attached to them, a large enough number do to make this form of analysis worthwhile.) These data were then layered against census data (such as level of educational attainment or socioeconomic status) aggregated at the output area level, in order to enable semantic patterns in the articulation of reading-related tweets and posts to be considered alongside the demographic features of the places where they were articulated.

The paper will set out the advantages offered by thematic, temporal and geospatial analyses, and suggest the components of cultural value which are best addressed by each, while also considering how these different forms of analysis may be productively combined.


2. Daniel Allington and Bethan Benwell (2012), Reading the Reading Experience: An Ethnomethodological Approach to ‘Booktalk’, in From Codex to Hypertext: Reading at the Turn of the Twenty-first Century, ed. by Anouk Lang (Amherst, MA: University of Massachusetts Press, 2012), pp. 217–233.

3. Rhiannon Bury, Ruth Deller and Adam Greenwood (2013), From Usenet to Tumblr: The Changing Role of Social Media, Participations 10, 299–318.


5. Ed Finn (2011), Becoming Yourself: The Afterlife of Reception, Pamphlets of the Stanford Literary Lab 3. 15 Sept 2011. 1 Nov 2013.



8. Jon Kleinberg b(2002), Bursty and Hierarchical Structure in Streams, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02 (New York: ACM, 2002), pp. 91–101.


If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO