Database Design and Identity: A Compromised Infrastructure

paper, specified "long paper"
  1. 1. Amy Earhart

    Texas A&M University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper will address issues of representing identity in databases, with a particular focus on Asian identities. Identity, such as that the conference theme defines, is often far more diverse and fluid than what a binary technology tool might represent. Utilizing the Database of African American and Predominantly White American Literature Anthologies or DALA, a database of 100 years of American and African American literature anthologies, constructed to investigate questions of identity and representation, this paper calls for theory-based, historically aware, transparent approaches to encoding identity in dh tools like the database.
Dh projects are increasingly interested in representing identity across time, whether through analysis of census data, historical records, or literary history. Much of this work is completed using databases, with human bodies turned into binary data points contained within rigid columns, a method that is resistance to contemporary theorization of identity. This project is informed by the growing wealth of scholarship produced by algorithmic bias scholars who have been attentive to the nuances of race and gender within technological infrastructures (Noble 2018, Benjamin 2019, McIlwain 2020, Brock 2020, Steele 2021, Womack 2022?), working across issues from social justice, algorithmic bias, surveillance, and social media. Information studies also offers analysis that we might use to address bias (Drabinski 2013, Block 2020). While DAAPWALA provides interesting results, it is, in many ways, designed as a project to test how a database might use categories of identity in transparent ways, modeling both problems and possibilities within a system that is insistent upon the binary. As Tara McPherson has argued, the best digital database projects: “…wield technology against its positivist self, foregrounding the work of the interface and refusing an easy transparency and corporate tenents of ‘good’ design via the template” (2015, 495).? It is this uneasiness that I hope to explore in the paper, designed to begin a conversation within the digital humanities community about best practices for encoding identity.
To interpret canonicity and inclusion/exclusion of authors in literary canons using a database, authors and editors must be categorized in a manner that is consistently understood throughout the body of the data, making static any categories into which individuals are encoded. For example, the database follows “good” database practices in designing the database, encoding designated vocabulary in drop down menus to regularize the selection of everything from author names to identities, adding viaf numbers, an international authority file, to further control identity and ensure interoperability. Such methods do regularize the data, something that needed to happen for comparative purposes and to insure consistency, particularly as there were numerous individuals inputting data over time, but the encoding methods are opposed to contemporary theoretical models of identity, a conundrum that will be discussed further.
The paper will discuss the choices made in identifying authors and editors as “Asian.” As the call for papers notes, dh has a long history in Asia. The conference theme is designed to center Asian dh scholars and their work, resisting the displacement of Asian dh work, as is often the case at ADHO conferences held in Europe and the Americas. A similar representation of identity, decentering western whiteness, is central to encoding identity within DAAPWALA. For the purposes of DAAPWALA I use the term Asian to include East Asian, South Asian, Southeast Asian, Central Asian and Pacific Islander. I will discuss the historical and cultural components that are used for categorizations, highlighting best practices used in TEI/XML encoding and library metadata standards. I argue that while we must stress interoperability, we must also account for the development of data as a representation of the local environment. In the case of the United States, and the literary anthologies I am studying, identity terms are not developed in a vacuum. Historic and contemporary racism and government policy impacted identification of groups that slip in and out of identities. I reject the use of census categories as the long history of the U.S. census is problematic and doesn’t meet the goals of DAAPWALA. I will also discuss the importance of working across national and cultural divides while also attending to the particularities of the local environment. Asian as define defined in Asian American literary anthologies does not have the same cultural markers as Asian as defined by the conference call. Further, I will discuss the importance of data base design that considers the differing ways that cultures regularize identity markers, such as names. While databases developed in the United States, for example, use a field for first name and a field for last name this is not consistent across all cultures and broader naming conventions are needed. Emphasizing the need to balance interoperability against cultural specificity, this paper asks the dh community to begin conversations about better data design.


Benjamin, R. (2019). Race After Technology: Abolitionist Tools for New Jim Code. Cambridge: Polity Press.

Block, S. (2020). Erasure, Misrepresentation and Confusion: Investigating JSTOR Topics on Women’s and Race Histories. DHQ:  Digital Humanities Quarterly 14, no. 1.

Brock, A. (2020.) Distributed Blackness: African American Cybercultures. New York: New York University Press.

Drabinski, E. (2013). “Queering the Catalog: Queer Theory and the Politics of Correction.” The Library Quarterly 83, no. 2: pp. 94–111.

Gallon, K. (2016.) “Making a Case for the Black Digital Humanities.” In Debates in the Digital Humanities 2016, Matthew K. Gold and Lauren F. Klein (eds.). Minneapolis: U Minnesota P.

McIlwain, C. (2020). Black Software The Internet & Racial Justice, from the AfroNet to Black Lives Matter. New York: Oxford UP.

McPherson, T. (2015). “Post-Archive: The Humanities, the Archive, and the Database.” In Between Humanities and the Digital, Patrik Svensson and David Theo Goldberg (eds.) pp. 483–502. Cambridge, Mass: MIT Press.

Noble, SU. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. New York: NYU Press.

Steele, CK. (2021). Digital Black Feminism. New York: NYU Press.

Womack, A. (2022) The Matter of Black Living: The Aesthetic Experiment of Racial Data, 1880–1930. Chicago: U of Chicago P.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO