Dealing with the complex data challenges offered by newspapers has never been easy, and for years historians have struggled to delve deeper into an archive that had only received the minimum metadata – title and date. However, since digital archives, historical research of newspapers has become easier than it ever was. For that, they deserve full credit. Yet when archives change their medium or storage, certain aspects of the documents they hold inevitably get lost. Similarly, when modes of access change, certain aspects of a source, particularly the aspects that are more complex than the interface offers easy access to, drift from the mind (Mussell, 2017). In a physical newspaper, the researcher would become intimately familiar with the spatiality of the texts they read from them. Front page or tucked in the back, set apart or clustered with articles about similar topics, these were all aspect that a researcher could use to gauge the importance of an article to the editor of the paper. Not so much with a digital archive. Keyword searches deliver the historian right to the text they requested, dropped directly on target without needing to consider a text’s surroundings.
To address the question of textual spatiality, we developed a tool that allows (comparative) visualisation of the location of articles on newspaper pages in the form of a set of heatmaps. It relies on metadata included in the archives to deliver the researcher directly to the article they request, mapping image coordinates to the article. The coordinates of the area covered by these bounding boxes can be extracted at large volumes, tidied, and plotted on a page-by-page basis. This further improves upon existing tools, which estimate article size based on word count (Beals, 2018).
These visualisations allow researchers to contextualise the space in which relevant articles appeared, by showing what a ‘typical’ newspaper’s pages looked like, and how the articles under investigation fit into that model. This paper proofs this technique on a case study on Imperial sentiment. By investigating the spatiality of the articles within the paper as a whole, it is possible to see the context in which British readers encountered the Empire, compare it to context of appearance for foreign news, and explore change over time. For example, we noted a prevalence of Imperial keywords on page 7, which continues through the 35 years under review – a total of 1800 issues. Closer reading of the paper reveals this to be the section for adverts and notices. Just this brief application shows the ways in which a transformative tool can be employed in conjunction with more traditional research methods.
These visualisations allow not only individual article sets to be plotted, but also allows a relative comparison between two sets of newspaper pages. The placement of articles mentioning Calcutta or Australia within Reynold’s Newspaper, a national weekly paper, can easily be compared with the placement of articles in, say, the Birmingham Daily Post, which only distributed regionally. Such a comparison shows both papers place articles on the Empire in very similar contexts, with patterns of appearance closer to national news rather than international or foreign news. Additionally, visualisations can be integrated with other textual analysis tools, such as topic modelling, as a way to allow more intuitive understanding of the contents of each topic, or even a comparative analysis of the places within a paper where certain topics occur. This technique allows an extra handle during the arduous process of discovering the semantic meaning of a topic.
Reducing the complex data to a two-dimensional graphic is non-trivial, as a multitude of aspects need to be considered, such as scaling pages to the same size, Binning per pixel or binning per column, and rough versus precise binning. It also needs to draw data from various parts and indices in the archive to produce the kinds of meaningful visualisations that facilitate new and more substantive research questions. This kind project underlines the need for digital archives to allow researchers access to the data ‘raw’, not only limited access through a web interface (Fyfe, 2018).
In the context of newspaper research, these digital tools enable a whole field of study in newspapers as objects, with articles having their own spatiality made accessible for research on a larger scale than previously possible. In the context of the Digital Humanities, these tools show how digital tools can drive methodological innovation in other fields, and how computers can, while on the one hand disconnecting us from the complexities of our sources, also serve as vehicle for making those complexities understandable.
Beals, M. H. (2018). Close Readings of Big Data: Triangulating Patterns of Textual Reappearance and Attribution in the Caledonian Mercury, 1820-40.
Victorian Periodicals Review, 51(4): 616-39.
Fyfe, P. (2018). Access, Computational Analysis, and Fair Use in the Digitized Nineteenth-Century Press.
Victorian Periodicals Review, 51(4): 716-37.
Mussell, J. (2017). Beyond the ‘Great Index’: Digital Resources and Actual Copies. In Shattock, J. (ed.)
Journalism and the Periodical Press in Nineteenth-Century Britain. Cambridge University Press, pp. 17-30.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Utrecht University
July 9, 2019 - July 12, 2019
436 works by 1162 authors indexed
Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html
Series: ADHO (14)