University of Toronto
Despite the crucial importance of corpus-building to the interpretation of text-mining research, it is often extremely difficult to know what is in a corpus. Even large institutional resources used by many scholars provide little context for their choices of what to include or exclude. These hidden choices are particularly problematic when historical selection factors might have led to the creation of corpora which re-create social inequalities. I examine six corpora which are used as the basis of most eighteenth century distant reading. I manually evaluate each corpus’s holdings for a very narrow selection of texts, works published in England 1789-99, to answer a series of bibliographical questions, including: how many titles are by men, by women, or unsigned? What broad categories of writing are represented — novels, plays, poetry, pamphlets, songs, sermons, ephemera, others? Analyzing the differences, I ask: do the most invested-in resources underrepresent women?
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Carleton University, Université d'Ottawa (University of Ottawa)
Ottawa, Ontario, Canada
July 20, 2020 - July 25, 2020
475 works by 1078 authors indexed
Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.
Conference website: https://dh2020.adho.org/
Series: ADHO (15)