The GeoNewsMiner: An interactive spatial humanities tool to visualize geographical references in historical newspapers. The GeoNewsMiner (GNM) is an interactive tool that maps and visualizes geographical references in historical newspapers. As a use case, we analysed Italian immigrant newspapers published in the United States from 1898 to 1920, as collected in the corpus ChroniclItaly (Viola 2018). Immigrant newspapers form a rich source that adds a historical dimension to the study of both the migration of the past century and the migratory experiences of migrant communities (Viola and Verheul 2019). They for instance enable researchers to compare references to the homeland and the host land (Vellon 2010; Forlenza and Thomassen 2016), thus offering an indication of the way diasporic media negotiate processes of assimilation and ethnic identification (Park 1922; Rhodes 2010; Viola and Musolff 2019, Viola and Verheul 2019), a topic that bears great relevance in the global age of satellite dishes and internet connectivity (Dhoest et al. 2012; Hickerson and Gustafson 2016; Parks 2005; Matsaganis, Katz, and Ball-Rokeach 2011; Appadurai 2008). In order to offer new perspectives on the geographies of the past, we employed a state-of-the-art deep learning method to extract and disambiguate place names from historical newspapers. Deep learning outperforms the state-of-the-art of place name extraction and disambiguation based on static lists in gazetteers or ensembles of NER-tools (Canale, Lisena, and Troncy 2018; Won, Murrieta-Flores, and Martins 2018; Mariona Coll Ardanuy and Sporleder 2017; Maria Coll Ardanuy 2017, Yadav & Bethard 2019). The two major advantages lie in its potential for text enriching: 1) they may be based on the historical context of a historical corpus; 2) they are able to recognize toponyms in a dynamic way, for instance as as a geographical concept (Viola and Verheul 2020). For the development of the GNM, we the deep learning sequence tagging tool developed by Riedl and Padó (2018). The sequence tagging retrieved 1,369 unique locations which occurred 214,110 times throughout the whole corpus. Because each individual document is timestamped, it was possible to quantify the number of references to each location was at any given time within the timeframe of ChroniclItaly, that is 1898-1920. Afterwards, locations were geocoded by using the Google API which identifies a place as it is stored in the Google Places database and in Google Maps. The tagged version of ChroniclItaly is available as an open access resource (ChroniclItaly 2.0, Viola 2019). Finally, to visualise and explore the data, we developed the GNM App (Figure 1). Unique to this tool is the possibility to aggregate the data according to a wide range of parameters (time; newspaper’s title; least/most mentioned places; absolute or relative frequency; aggregation on national, regional or city level). It is also possible to overlay historical maps that show the borders of selected years (1880, 1914, 1920, 1994), and download and share the data/results (Figure 2). This offers users the possibility to analyse the results in an intuitive, interactive, and reproduceable way as well as providing great flexibility to researchers working in spatial humanities, particularly from a historical perspective. One potential application of GNM is for example the possibility to reconstruct the “geographical agenda” of historical newspapers by analysing the changing geographical bias of the press, an issue urgent to fields such as media studies, cultural history and international relations (McCombs 2014; Craine 2014; Reese and Lee 2012; Wanta, Golan, and Lee 2004; Gans 2004; Beaudoin and Thorson 2001; Ginneken 1998; Gitlin 2003). As a preliminary data exploration, for instance, the tool shows that references to geographical locations in both Italy and the United States stay remarkably stable over the period that includes the First World War. The full documentation of GNM is made available to the research community to facilitate transparency, reproducibility and replicability (Viola 2020). The app has much to recommend particularly to humanities scholars who are more and more confronted with the challenge of exploring collections larger than before and in a digital format.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Carleton University, Université d'Ottawa (University of Ottawa)
Ottawa, Ontario, Canada
July 20, 2020 - July 25, 2020
475 works by 1078 authors indexed
Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.
Conference website: https://dh2020.adho.org/
Series: ADHO (15)