University of Sheffield
The poster provides a general summary of my thesis. This can be broadly divided into three interrelated areas:
1. the historical background to this biographical dictionary.
2. This is of great significance to the project since it provides added understanding to what is shown to be a unique source structure.
3. Automatic indexing provides a flexible method of access to this structure, which is largely under exploited.
1. The life of John Venn, the Cambridge logician and antiquarian provides important background to the evolution of the Alumni idea. Previously, biographies have separated the various phases in his life. My research has shown however that these strands were intimately related. Descended from a professional, clerical elite, he was concerned with the logic of taxonomy and measurement of empirical data. In later life he applied these techniques in the construction of his college directories and eventually the Alumni Cantabrigienses- a biographical dictionary of Cambridge men. He is in many ways one of the first information scientists.
2. The Alumni records are compiled from thousands of sources. The records and fields are not, therefore, semantically equivalent to each other. However it is preferable to view these records as a catalogue of Cambridge men which has been constructed in a scientific manner. This allows for individual variation within the records, but there are also logical consistencies. The most important fields are the dynamic career structures which allow us to view examples of multi-dimensional professional activity, and inter-university careers and inter collegiate careers.
3. Alumni data in the printed volumes has previously been analysed in very general, quantitative studies. These have forced the rich career data into two dimensional tables and broad social taxonomies. Much of the information is therefore under exploited. In order to maximise the utility of the source, the classification of the fields identified in the structural analysis needs to made explicit in order to create indices into the records. These can be tagged manually. However, this is costly and labour intensive. Information extraction systems provide a sophisticated tool for automatically indexing fields within unstructured electronic data. These offer a solution for parsing fields from OCR edited text.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at University of Glasgow
Glasgow, Scotland, United Kingdom
July 21, 2000 - July 25, 2000
104 works by 187 authors indexed
Affiliations need to be double-checked.
Conference website: https://web.archive.org/web/20190421230852/https://www.arts.gla.ac.uk/allcach2k/