John Venn and the Alumni Cantabrigienses - an example of applied natural language processing techniques in enhancing the utility of a historical database

Emma Barker

Authorship

1. Emma Barker

University of Sheffield

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The poster provides a general summary of my thesis. This can be broadly divided into three interrelated areas:
1. the historical background to this biographical dictionary.
2. This is of great significance to the project since it provides added understanding to what is shown to be a unique source structure.
3. Automatic indexing provides a flexible method of access to this structure, which is largely under exploited.

1. The life of John Venn, the Cambridge logician and antiquarian provides important background to the evolution of the Alumni idea. Previously, biographies have separated the various phases in his life. My research has shown however that these strands were intimately related. Descended from a professional, clerical elite, he was concerned with the logic of taxonomy and measurement of empirical data. In later life he applied these techniques in the construction of his college directories and eventually the Alumni Cantabrigienses- a biographical dictionary of Cambridge men. He is in many ways one of the first information scientists.

2. The Alumni records are compiled from thousands of sources. The records and fields are not, therefore, semantically equivalent to each other. However it is preferable to view these records as a catalogue of Cambridge men which has been constructed in a scientific manner. This allows for individual variation within the records, but there are also logical consistencies. The most important fields are the dynamic career structures which allow us to view examples of multi-dimensional professional activity, and inter-university careers and inter collegiate careers.

3. Alumni data in the printed volumes has previously been analysed in very general, quantitative studies. These have forced the rich career data into two dimensional tables and broad social taxonomies. Much of the information is therefore under exploited. In order to maximise the utility of the source, the classification of the fields identified in the structural analysis needs to made explicit in order to create indices into the records. These can be tagged manually. However, this is costly and labour intensive. Information extraction systems provide a sophisticated tool for automatically indexing fields within unstructured electronic data. These offer a solution for parsing fields from OCR edited text.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2000

Hosted at University of Glasgow

Glasgow, Scotland, United Kingdom

July 21, 2000 - July 25, 2000

104 works by 187 authors indexed

Affiliations need to be double-checked.

Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/

Series: ALLC/EADH (27), ACH/ICCH (20), ACH/ALLC (12)

Organizers: ACH, ALLC

John Venn and the Alumni Cantabrigienses - an example of applied natural language processing techniques in enhancing the utility of a historical database

1. Emma Barker

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2000