Capturing the Geography of 1900s Britain as Text: Findings from the GB1900 Crowd-Sourced Gazetteer Project

paper, specified "long paper"
Authorship
  1. 1. Humphrey Southall

    University of Portsmouth

  2. 2. Paula Aucott

    University of Portsmouth

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
The purpose of the GB1900 project was to transcribe all text strings appearing on the second edition of the Ordnance Survey’s County Edition six inch-to-one mile maps (1:10,650 scale) published between 1888 and 1914 and covering the whole of Great Britain. The only exclusions were primarily numerical data easily obtained from modern digital mapping, i.e. spot heights, the depths of lakes and distances on road signs. The scale of this task, and the limited effectiveness of optical character recognition when applied to text on maps, made crowd-sourcing the most appropriate methodology.

Figure 1: Excerpt from County Edition Six Inch map (blue dots mark GB1900 transcriptions)

The project grew out of Cymru1900, a collaboration between the National Library of Wales, the University of Wales Centre for Advanced Welsh and Celtic Studies, the Royal Commission on the Ancient and Historical Monuments of Wales and People’s Collection Wales, with funding from the Welsh Assembly. Cymru1900 launched in October 2013 and remained live until it was replaced by GB1900, but was much more successful at obtaining initial transcriptions than the matching confirmatory transcriptions required to finalize each string.
GB1900 involved additional partners at the National Library of Scotland, supplying a digital map mosaic covering the whole of Great Britain, and the University of Portsmouth, providing additional development time to revise the software to encourage confirmations, then hosting the revised system. GB1900 launched in September 2016, incorporating all existing transcriptions and user registrations from Cymru1900, and was closed down at the end of January 2018, by when it was very hard to find new text to transcribe. The overall project history, sources and software is described in Southall et al (2017), while the crowd-sourcing process and the motivations of volunteers are explored in Aucott et al (2019).

The GB1900 Datasets
Following some manual work to resolve c. 30,000 problematic transcriptions, final datasets were made available for download in July 2018, from Portsmouth’s web site
A Vision of Britain through Time:

http://www.visionofbritain.org.uk/data/#tabgb1900

Three distinct datasets were created. Firstly, a “raw dump” consisting of all the uncleaned tables from the MongoDB database behind GB1900, excluding only the table holding user registration details and including all the different transcriptions of each string; this is offered under a CC0 license, enabling anyone to do what they like with it. Secondly, the main “cleaned” dataset, containing just one “agreed” transcription of each of 2.55m. strings, together with OSGB and WGS84 geographical coordinates. Thirdly, an “abridged” version from which the commonest strings judged not be place names have been removed. The latter two datasets are offered under CC-BY-SA licenses.
The abridged dataset is presented as a gazetteer, meaning an inventory of the names by which people identified particular places: towns, villages, hamlets, woods and so on. As such, it is possibly the most detailed gazetteer of Britain ever created, and certainly the most detailed specifically historical gazetteer. It is further described in Aucott and Southall (forthcoming).
However, more than half the transcriptions are not place names but are still of very considerable interest, and our main focus here. The three most commonly occurring names in the abridged dataset are “Manor House” (1,617 occurrences), “Manor Farm” (1,496) and “Mount Pleasant” (838). Conversely, the most common unabridged terms are “F.P.”, meaning a footpath (306,583 occurrences), “W”, meaning a well (190,979) and “P”, meaning a pump (115,877). The original justification for including these in the transcription process was the difficulty of defining “place names” with enough precision and clarity to be really consistently applied by the volunteers.

Figure 2: Locations and types of windmill, from GB1900

The complete and abridged datasets are currently made available simply as CSV files, for easy uploading into databases or viewing in spreadsheets. Although we have written elsewhere about the importance of presenting gazetteers as linked data (Southall et al, 2011), the datasets consist primarily just of strings and an associated coordinate, plus the names of containing modern local authorities and parishes added via point in polygon look-ups, so these data are not linked. However, we are exploring how the GB1900 data may be linked to the DEEP gazetteer created from the Survey of English Place Names (Ell et al, 2016), which has a complex semantic structure but currently contains locations for only 4.4 per cent of the included places.

Capturing geography in words
In some senses, the County Series maps are not large scale maps but small scale plans: they contain no symbology or key, just the outlines of buildings and other features; and a great deal of text. As a result, transcribing the text, and recording the locations where it appears, captures most of the maps’ meaning, their semantic content. Capturing all those detailed outlines from the maps would not greatly extend our geographical knowledge, while greatly complicating both the transcription software and the volunteers’ task. It would however have been desirable to capture the fonts and sizes used for each text string, as that does incorporate some additional meaning.

Figure 3: Chalk Pits in south east England, from GB1900

We can learn much about the detailed geography of Britain’s people over the last two centuries from the census. However, sources for the broader study of the evolving cultural landscape are more limited. We have worked extensively with the Land Utilisation Survey of Great Britain from the 1930s, but this provides only a very broad brush overview of land use, and little about what makes particular places special (Baily et al, 2011). Figures 2 and 3 provide two different examples from GB1900 data of how humans both used and shaped the landscape, making particular places distinct. Figure 2 shows how windmills were a common feature of the flat landscapes of eastern England, and give the lie to those who see modern windfarms as a new intrusion. Figure 3 of course partly just shows the distribution of chalk in the underlying geology, but the concentrations, especially into the Hampshire-Surrey borders, also tell us something of agricultural improvement.

How GB1900 is being used
The data are already being used by project partners. The National Library of Scotland have implemented a search facility using it on their open access online map collection for their six inch to the mile maps. Searching by place names from GB1900 finds the location on the six inch map and the visitor can then choose to view another map for the same location:

https://maps.nls.uk/geo/explore/

The Royal Commission on the Ancient and Historical Monuments of Wales have a
 List of Historic Place Names in Wales which must be consulted for all new developments in Wales. Currently using data from Cymru1900, it will be updated to include GB1900:

https://historicplacenames.rcahmw.gov.uk/

The authors are incorporating a GB1900 search facility into the Vision of Britain system. This links GB1900 data into the existing historical information structure and for the first time will offer greater detail for places within parishes:

http://www.visionofbritain.org.uk/expertsearch#gb1900/

Researchers independent of the project have also been experimenting with the dataset, including Jim Clifford, a historian at the University of Saskatchewan who looked at breweries, distilleries and pubs:

GB1900 Works Website

Joe Rose (B.C. Canada) has been working on industrial sites and has produced a 33 volume listing of Scottish mills, linking them to other locational information. His dataset is being used by a current Glasgow PhD student. He is also working on a desktop application. This link shows an early extraction of quarries:

https://geo.nls.uk/maps/gb1900quarry/

Conclusion
The County Series maps are the largest scale at which the Ordnance Survey mapped the whole of Great Britain, larger scales existing for towns and for more densely populated rural areas. Most of their meaning is in the text that GB1900 has extracted, and while it would be hard to claim this is a literary work, it is a remarkably detailed record of both physical and cultural landscapes, which we can now present in summary form.

Bibliography

Aucott, P., Southall, H. and Ekinsmyth, C. (2019). Citizen Science through Old Maps: Volunteer Motivations in the GB1900 Gazetteer-Building Project.
Historical Methods: A Journal of Quantitative and Interdisciplinary History. DOI: 10.1080/01615440.2018.1559779.

Aucott, P. and Southall, H. (forthcoming). Locating Past Places in Britain: Creating and evaluating the GB1900 Gazetteer.

Baily, B., Riley, M., Aucott, P. and Southall, H. (2011). Extracting digital data from the First Land Utilisation Survey of Great Britain – methods, issues and potential.
Applied Geography, 31: 959-968.

Ell, P., Hughes, L. and Southall, H. (2016). Digitally exposing the place names of England and Wales. In Berman, M., Mostern, R. and Southall, H. (eds.)
Placing Names: Enriching and Integrating Gazetteers. Bloomington: Indiana University Press, pp. 146-62.

Southall, H.R., Mostern, R. and Berman, M. (2011). On historical gazetteers.
International Journal of Humanities and Arts Computing, 5: 127-145.

Southall, H., Aucott, P., Fleet, C., Pert, T., and Stoner, M. (2017). GB1900: Engaging the Public in Very Large Scale Gazetteer Construction from the Ordnance Survey “County Series” 1:10,560 Mapping of Great Britain.
Journal of Map and Geography Libraries, 13: 7-28.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.