Universität Basel (University of Basel)
Universität Basel (University of Basel)
At the end of the seventeenth century, so-called intelligence newspapers emerged in big European cities like Paris or London, springing up all over the continent in the course of the eighteenth century. Instead of political news, they contained mainly classified ads with the purpose of connecting people offering and people seeking something. These advertisements covered a wide scope, like real estate, work and travel opportunities, specific services, information, as well as all kinds of things: things for rent and things for sale, lost and found items, second-hand goods, newly invented or well-known medical products, and imported goods like coffee and tea, to name just a few examples. Many of these ads were placed by non- or semi-professional sellers, but also professional suppliers such as craftsmen, traders and shops used this new, but soon well-established, communication platform to inform the reading public about their products and to find customers. Therefore, these intelligence newspapers are, on the one hand, a particularly interesting source for examining the micromechanics of local markets, on the other, helpful for analyzing connections between local, transregional and increasingly global markets of goods in the early modern period – and they are also quite unmanageable for a single (and analog) research endeavor with regards to the sheer mass of data.
Our newly started project takes one particular intelligencer, the
Basler Avis-Blatt, as a case study for the application of different digital tools and methods to this source type. Up to now, intelligence newspapers have not been taken into account as a whole beyond text recognition, and even then mostly in small samples. This is due to the overwhelming amount and diversity of the different ads, as the intelligencers appeared periodically for years or even decades. The here presented source, which is preserved in its entirety, appeared weekly, later even daily, between 1729 and 1844 in the city of Basel; this sums up to 6391 issues with about 50 000 pages and over 750 000 single ads. By using computational methods and digital tools, we want to facilitate an extensive and comprehensive analysis of intelligence newspapers, combining quantitative with qualitative approaches.
After having built a digital collection, using IIIF mechanisms for presentation and annotation, the whole corpus was made available as full text (automated text recognition with a trained HTR+ model, Transkribus), and the single pages have then been enhanced with page xml after automated layout recognition/page segmentation. Every page is segmented into single ads, matching text with layout units. In doing so, data and text mining can now focus on the smallest and most important entity of the corpus: the ad itself – a major step compared to the analysis of unsegmented full text.
To generate structured data for the development of a comprehensive database for further analysis, the segmented ads are classified into ad types (buy, sell, loan, …) and content types (real estate, work, food, …). So far, the classification has been done manually, which already allows first quantitative outcomes, pointing towards content-related research questions; after having established a ground truth of classified ads, supervised machine learning will be used to test automated or semi-automated ad classification. Unsupervised machine learning will serve as a sensor for slipped patterns and as a corrective measure to question the constructed annotation types and to evaluate categories.
The procedure of cascading classification makes it possible to handle the advertisements with different areas of interest and to make a preselection of those that are to be classified further, following different specialized research questions – e.g. the category “animals”, adding “dog”, “duck”, or “donkey”; as the classification process of the ads is nearly infinite, collaboration during the project as well as after its completion is a central aspect. With the aim of publishing the digital collection and the underlying (ad-connected) text, and also making the database accessible, the intelligence newspaper in question will be made available for other researchers and a variety of possible research questions.
The proposed poster will present the different approaches to handle and evaluate the large source and the deriving data sets. It shows workflows established in the initial stage of the project, first results obtained with text mining, and impressions of the advantages of combining a variety of digital tools and computational methods with content-related questions in the analysis of early modern newspapers. It also presents perspectives on further research possibilities and research questions emerging from the project.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at Utrecht University
Utrecht, Netherlands
July 9, 2019 - July 12, 2019
436 works by 1162 authors indexed
Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html
References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html
Series: ADHO (14)
Organizers: ADHO