Linked Literary History, or An Ontology of One’s Own: The Canadian Writing Research Collaboratory Ontology

paper, specified "long paper"
  1. 1. Susan Brown

    University of Guelph

  2. 2. Joel Cummings

    University of Guelph

  3. 3. Jasmine Drudge-Willson

    University of Guelph

  4. 4. Abigel Lemak

    University of Guelph

  5. 5. Kim Martin

    University of Guelph

  6. 6. Alliyya Mo

    University of Guelph

  7. 7. Deb Stacey

    University of Guelph

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Creating an ontology for feminist literary history with a view to its extensibility to other literary and cultural work involves significant decisions. The ontology described here supports the linked open data (LOD) strategy of the
Canadian Writing Research Collaboratory (CWRC)’s online platform. CWRC launched in 2016 as “An online infrastructure for literary research in and about Canada designed to meet the challenges and embrace the opportunities of the digital turn” (About CWRC, 2018). It pursues this mission in part through contributing LOD to the Semantic Web initiative to make Web resources more discoverable, shareable, and interoperable by making them machine-readable (Oldman, Doerr, and Gradmann, 2016; Smith, 2012). It promotes linking amongst CWRC-hosted projects through use of LOD identifiers both in metadata and within XML documents through an online XML editor and Web Annotation creator, CWRC-Writer (Brown et al., 2019). The next step is to push its LOD extracted from CWRC content into the Web. The
Orlando Project (Brown, Clements, and Grundy, 2019) is our pilot dataset for extracting more complex LOD.

The Orlando Project has published since 2006
Orlando: Women’s Writing in the British Isles from the Beginnings to the Present (Brown, Clements, and Grundy, 2006-2019), an online textbase for the research and discovery of women’s writing. Its 8 million words of richly-encoded text describe writers’ lives, careers, and contexts using a bespoke XML schema. The textbase has been acclaimed as initiating a new breed of information resource and for having “changed the parameters of the scholarship and teaching of British women’s writing” (Bowers, 2012). However, its utility is impeded by its data structure and paywall. LOD offers a means to make much of the knowledge embedded in Orlando more accessible and interoperable.

The first requirement was an ontology. No single extant ontology covers the range of biographical and literary relationships that CWRC needs to represent. We therefore sought to adapt as much as we could from elsewhere while filling gaps through an ontology of our own (to echo Virginia Woolf [1929], whose ground-breaking feminist analysis inspired the Orlando project’s name). We embrace Gruber’s widely accepted definition of a computational ontology as “A specification of a representational vocabulary for a shared domain of discourse”; an “explicit specification of a conceptualization”, where a conceptualization is understood as “an abstract, simplified view of the world that we wish to represent for some purpose” (Gruber, 1993). We outline here some key aspects of our ontological strategy, many of which are more fully described in the published preamble to the ontology itself (Brown et al, 2019).

Ontology Design Process
Our process was pragmatic: we aimed to mobilize the data as soon as feasible, adopting where we could in accordance with best practices and extending or creating where we could not. In using other ontologies we sometimes importing (e.g. the Organization Ontology [Reynolds, 2014]), sometimes cherry-pick terms or relationships, and sometimes base definitions on external ones and cite them (e.g.
Getty) so we can relate terms to each other within our structure. The ontologies from which CWRC draws include
Web Annotation,
Dublin Core,
PROV, and
CIDOC-CRM, as well as the
SKOS vocabulary.

The ontology is designed to support open-world datasets like Orlando‘s: we develop terms on an as-needed basis rather than trying to be exhaustive. We are not trying to map the world but to represent what we have. The ontology is dynamic, expanding as needed, and the development process iterative. In some respects, this ontology strategy resembles lean startup and agile development processes (Cummings and Stacey, 2018).
We here outline the major principles and decisions that have informed the work thus far.

Antifoundationalism and representationality
The ontology reflects what Alan Liu calls a “lightly antifoundationalist” epistemology; it emerges from feminist theory and science and technology studies, with an emphasis on situated knowing and a wariness of the consequences of classification (Liu, 2016; Haraway, 1991; Braidotti, 2006; Bowker and Star, 1999; see also Smithies, 2014). Ontologies can be understood as approximating the world and “thus capturing some truth about it, without enjoying a one-to-one correspondence with categories of entities as they exist completely independently of human languages or human practices” (Alcoff, 2000). Rather than viewing ontologies as involving “reality representation” (Smith, 2004), we understand the CWRC ontology as deeply representational. Structuring it with the Web Ontology Language (OWL) and devising Shape Constraint Language (SHACL) rules for our data will provide our data with the tactical benefits of ontological rules including intelligibility, error detection, processability, inferencing, and interoperability.
This work aims to intervene, with experimental models, in the knowledge structures of our time. We therefore need to accommodate some aspects of those structures in order to make an argument and to be intelligible, even if the aim is to move beyond some categories. So we take as our touchstone the core concerns of the datasets themselves. Some priorities were covered by existing ontologies, but sometimes there was nothing to approximate what we needed, so we invested significant resources there.

Deferred upper-level ontology
The team considered and debated adopting an upper-level ontology that lays out categories and relationships at the most general level. After reviewing leading candidates, we held off because none align closely with our approach. DOLCE Light is closest, but orphaned, and the CIDOC-CRM is very event-oriented, whereas much of our data is not. We are therefore deferring the question to see whether it seems necessary and until we have a wider range of use-cases, including ones related to space and time, so as to better evaluate the implications (cf. Posner, 2015 on Cartesian representation).

Provenance and citation
Every LOD assertion links back to the encoded source (in this pilot instance, the Orlando textbase), connecting it to nuanced prose as well as the specific XML markup from which the relationship was extracted. The data is also designed to point to the specific scholarly sources from which claims within the source text are derived.

The Web Annotation Data Model (Sanderson et al, 2017) links identifications of entities to their source, as noted above. We also use the WA model to characterize the properties associated with a writer as descriptions of that person, refuting positivist claims. Framing claims as annotations motivated by description is important, given that the creators of the markup did not anticipate this use of the data, and given that only dates have any kind of certainty value attached to them.

Ambiguity, diversity, and nuance
Rather than being disambiguated, leaky cultural categories are generously documented and represented as mutually constitutive with specific discursive frameworks or “Context” classes of annotations (Brown et al., 2017). “Textual labels” group together related terms emerging from different discourses. Rather than mash complex and contested terms into a single identity, the ontology retains distinctions made between them within the encoding and groups them together with these labels – so, for instance, the jewishLabel instance and different constructions of Jewishness are linked using the relationship “represents”.

Linking to legacy terms
The labelling strategy and “represents” predicate help handle problematic legacy terms, along with another custom predicate called hasFunctionalRelation, which signifies that a term has served in a parallel way in other datasets, but has no semantic commensurability with our term. Gender and sex are definitively different, so the OWL sameAs relationship does not work to link terms for sex to womanLabel and manLabel. However, sex can map (inadequately) to gender terms in existing datasets, which often employ the ISO 5218 categories for sex (“ISO/IEC”). The “hasFunctionalRelation” predicate creates a complex but still processable bridge between Orlando data and datasets of women’s writing that employ ISO 5218 values.

We conclude by demonstrating a couple of challenges arising from this work with visualizations of extracted data using the HuViz LOD explorer (described in a parallel paper). First is the tension between complexity and nuance, on the one hand, and readability and processability on the other. As Rob Sanderson and David Newbury stress, we need Linked Open Usable Data (2017). The Orlando LOD dataset is verbose and complex, raising concerns that it may be incomprehensible and unwieldy despite the ontology. Second is the tension between standards and bespoke terms. Staying close to the Orlando data has meant, for instance, devising a genre ontology (CWRC Genre Ontology) rather than adopting a partially suitable one, making CWRC data more of an island in the LOD stream than it might otherwise be. However, standards can also have unfortunate consequences: adopting the Web Annotation framework means that our assertions are now less direct and easily comprehensible by non-specialists than in our initial model for our data.
The CWRC ontology aligns with the strain of critical DH that applies humanities epistemologies to create digital representations: the CWRC ontology works not with data but with capta (Drucker, 2011). The millions of triples in the Orlando Project’s British Women’s Writing Dataset will advance feminist digital literary history, and experiments with the CWRC ontology will help refine strategies for writing feminist literary history, among other complicated stories, into the Web.

Works Cited

About CWRC (2018).
Canadian Writing Research Collaboratory.

Alcoff, Linda Martín (2000). Who’s Afraid of Identity Politics? In Moya, Paula M. L. and Hames-García
, Michael R. (eds.),
Reclaiming Identity: Realist Theory and the Predicament of Postmodernism. University of Califormia Press, pp. 312–44.

BIBFRAME (n.d.). Bibliographic Framework Initiative. Library of Congress.

Bowers, Toni (2012). Exploring the Richardson Circle Using the Orlando Database.
The Scriblerian 44-45 (2,1): 56–58.

Bowker, Geoffrey C., and Susan Leigh Star (1999).
Sorting Things Out Classification and Its Consequences. MIT Press.

Braidotti, Rosi (2006). Posthuman, All Too Human: Towards a New Process Ontology.
Theory, Culture & Society 23(8): 197–208. doi:10.1177/0263276406069232.

Brown, Susan, et al. (2019). CWRC Ontology Preamble. Canadian Writing Research Collaboratory.

Brown, Susan, James Chartrand, Mihaela Ilovan, Andrew McDonald, Jeffrey Antoniuk, and Michael Brundin (2019). CWRC-Writer.
Canadian Writing Research Collaboratory.

Brown, Susan, Patricia Clements, and Isobel Grundy (2019). The Orlando Project: Feminist Literary History and Digital Humanities.

Brown, Susan, Patricia Clements, and Isobel Grundy (2006-2019).
Orlando: Women’s Writing in the British Isles from the Beginnings to the Present. Cambridge University Press.

Brown, Susan, Abigel Lemak, Colin Faulkner, Kim Martin, and Rob Warren (2017). Cultural (Re-)Formations: Structuring a Linked Data Ontology for Intersectional Identities. In
Digital Humanities 2017: Conference Abstracts. McGill University.

CIDOC-CRM (2011). Martin Doerr, Matthew Stiff, Nick Crofts, Stephen Stead, and Tony Gill. CIDOC Conceptual Reference Model V. 5.0.4.

Cummings, Joel, and Deborah Stacey (2018). Lean Ontology Development: An Ontology Development Paradigm Based on Continuous Innovation. In 
Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, pp. 365-372. DOI: 10.5220/0006963003670374

CWRC Ontology (2019). Susan Brown et al. Canadian Writing Research Collaboratory.

CWRC Literary Genre Ontology (2019). Susan Brown et al. Canadian Writing Research Collaboratory.

DBpedia (2017). Leipzig, University, Univerity of Mannheim, and Openlink Software. DBPedia. Wikipedia2.

DCMI (2012). Board, DCMI Usage. Dublin Core Metadata Initiative. Dublin Core Metadata Initiative.

DOLCE (2006). Claudio Masolo, Stefano Borgo, Aldo Gangemi, Nicola Guarino, Alessandro Oltramari. Wonder Web. Laboratory for Applied Ontology. Internet Archive.

Drucker, Johanna (2011). Humanities Approaches to Graphical Display.
Digital Humanities Quarterly 5(1).

FOAF (2014). Dan Brickley and Libby Miller. FOAF Vocabulary Specification 0.99. Friend of a Friend.

Getty (2018). Getty Vocabularies. The J. Paul Getty Trust.

Gruber, Thomas R. (1993). A Translation Approach to Portable Ontology Specifications.
Knowledge Acquisition 5(2): 199–220.

Haraway, Donna (1991). A Cyborg Manifesto : Science, Technology, and Socialist- Feminism in the Late Twentieth Century. In
Simians, Cyborgs and Women: The Reinvention of Nature. Routledge, pp. 149–81.

Liu, Alan (2016). Drafts for
Against the Cultural Singularity (book in progress). Alan Liu.

Oldman, Dominic, Martin Doerr, and Stefan Gradmann (2016). Zen and the Art of Linked Data: New Strategies for a Semantic Web of Humanist Knowledge. In Schreibman, Susan, Siemens, Ray and Unsworth, John (eds.),
A New Companion to Digital Humanities. Wiley-Blackwell, pp. 251-273.

Organization (2014). Dave Reynolds. Organization Ontology. W3C.

Orlando British Women's Writing Dataset (2018). The Canadian Writing Research Collaboratory.

Posner, Miriam (2015). What’s Next: The Radical, Unrealized Potential of Digital Humanities.
Miriam Posner’s Blog.

PROV (2013). Timothy Lebo, Satya Sahoo, and Deborah McGuinness. PROV-O: The PROV Ontology. W3C.

Sanderson, Robert, and David Newbury (2017). Linked.Art & Vocabularies: Linked Open Usable Data. J. Paul Getty Trust.

SKOS (2009). SKOS Simple Knowledge Organization System Reference. Alistair Miles and Sean Bechhofer. W3C.

Smith, Barry (2004). Beyond Concepts: Ontology as Reality Representation. In Varzi, Achille C. and Vieu, Laure (eds.), Formal Ontology in Information Systems. OIS Press, pp. 1-12.

Smith, Barry (2012). On Classifying Material Entities in Basic Formal Ontology. In Interdiscipinary Ontology. Proceedings of the Third Interdisciplinary Ontology Meeting. Keio University Press, pp. 1–13.

Smithies, James (2014). Digital Humanities, Postfoundationalism, Postindustrial Culture.
Digital Humanities Quarterly 8 (1).

Web Annotation (2017). Robert Sanderson, Paolo Ciccarese, and Benjamin Young. Web Annotation Data Model. W3C.

Woolf, Virginia (1929).
A Rooms of One’s Own. Harcourt, Brace, and Company.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO