Modelling Gender Diversity – Research Data Representation Beyond the Binary:

paper, specified "long paper"
  1. 1. Viktor J. Illmer

    Freie Universität Berlin (FU Berlin)

  2. 2. Lisa Poggel

    Freie Universität Berlin (FU Berlin)

  3. 3. Franziska Diehr

    Robert Koch Institute, Germany

  4. 4. Lindsey Drury

    Freie Universität Berlin (FU Berlin)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


How may we model gender to account for its diversity while remaining simple enough to implement and query?

We address why gender diversity needs to be represented in databases, especially when confronted with historical sources. Analysing examples of gender modelling in established metadata schemata and descriptive data models, we propose a model that strikes a balance between an atomic and flexible ontology that returns valid results even for naïve data queries. We introduce the use of
gender qualifiers, which allow nuanced statements on how the gender information was formulated. Use of the proposed modelling strategies are demonstrated following the Wikibase data model.Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy in the context of the Cluster of Excellence Temporal Communities: Doing Literature in a Global Perspective – EXC 2020 – Project ID 390608380.

Archiving the colonial concept of the gender binary

Historical research shows the gender binary to be a modern and colonial organising principle. Scholars have traced transgender identities throughout European history (Betancourt, 2020; Feinberg, 1996; Mauriello, 2019; Moyer, 2015) and examined how colonialism has affected Indigenous gender systems. Various communities in North America, Asia and Africa did not adhere to a gender binary before the imposition of colonial knowledge regimes (Amadiume, 2015; Hinchy, 2020; Neil & Garcia, 2009). Many Indigenous societies accepted genders beyond male
and female (Cleves, 2014; Herdt, 1994; Mirandé, 2016; Roscoe, 1998; Slater & Yarbrough, 2011). Archives carry the imprint of colonial and heteronormative gender regimes (Arondekar, 2009; Ćosić et al., 2014). A critical modelling practice should therefore reflect the constructed nature of gender in the archival record and its limited temporal and geographical scope (see also Flanders & Jannidis, 2015, pp. 14–15).

Status quo: Representation of gender in established standards

Established metadata schemes do not adequately model the diversity of gender. Most standards use binary models, which only allow for the expressions
male and
female. The German National Library’s Integrated Authority File, for example, includes the concepts
female and
not known (Deutsche Nationalbibliothek, 2019a, 2019b). In this way, it is similar to the system standardised by ISO 5218 (International Organization for Standardization, 2004). This vocabulary is inadequate even assuming it only catalogues purported biological sex, which are more diverse than this, e. g. intersexuality. Moreover, post-structural feminists argue that a distinction between sex and gender cannot be maintained (Butler, 2006).

Some binary models include an
‘other’ category, such as the specification for vCard (Perreault, 2011). TEI guidelines include vCard and ISO 5218 vocabularies as examples to use for values of the element (Text Encoding Initiative Consortium, 2021). However, being identified with a category of
other can be a stigmatising experience as well as one of othering (Kronk et al., 2021; Puckett et al., 2020). Othering also occurs in Wikidata’s model, where a cis person’s gender is recorded as
female, while a trans person’s gender is
transgender female, thus drawing a distinction in gender between cis and trans individuals.

Introducing gender qualifiers

By distinguishing gender (with expressions such as
non-binary, and more) from
gender modality, a concept proposed by Ashley (pre-published) as an umbrella term that includes but is not limited to transgender and cisgender modalities, we make it possible for more precise information on a person’s gender. Nonetheless, we avoid using the term gender
identity, because historical sources seldom include gender self-identification, but instead are based on presumed or ascribed gender. In order to create a reliable data basis, we believe it indispensable to make explicit how the gender-specific information was formulated. We therefore propose extending gender information with qualifying statements. The terminology assembled in the GSSO ontology (Kronk & Dexheimer, 2020), which includes
assumed,assumed gender: ‘The assumption of a person’s gender based on any predefined characteristics and biases.’
experienced gender: ‘The gender with which a person identifies.’ (we propose using the term
self-identified instead),
lived,lived gender: ‘The gender in which an individual lives their everyday life. This is usually a combination of performance of expected gender roles and gender expression, and may not represent a person’s actual gender.’ and
recorded gender: ‘Gender as recorded by any of various institutions, organisations, writings, etc.’ gender, appears especially suitable to this task. However, while the GSSO lists non-Western genders such as hijra, muxe and X-gender as ‘culturally specific’ non-binary gender identities (Kronk & Dexheimer, 2020), we propose to instead place all gender instances on one level, countering the Eurocentrism in subsuming all genders under the concepts of non-binary and transgender (Kravitz, 2021).


The qualifier value
self-identified gender should only be used if there is evidence of the gender identity given by the respective persons themselves. This is more often the case with contemporary, non-historical individuals (Figure 1). But historical exceptions exist: 18th-century preacher Public Universal Friend (Figure 2), for instance, who publicly identified as being reborn genderless and requested to be referred to without pronouns from 1776 on (Moyer, 2015, pp. 12–13).

Where there is no personal record available, the
self-identified gender cannot be determined. Diné weaver and healer Hastiin Tł’a (1867–1937), for instance, was a nádleehi, a Diné Two-Spirit person. According to Naruszewicz (2016, pp. 2–3), weaving and the ceremonial duties of healers were gendered activities within the Diné tribe, and only nádleehi were allowed to follow both. The fact that Tł’a was both suggests that Tł’a’s gender can be assumed as nádleehi (Figure 3). We suggest using
assumed gender as a qualifier in this case and whenever there are only weak indicators (e.g. gendered pronouns, titles, occupations) available. This might also be the case when we are confronted with gender information derived from authority files, because there are no indicators available on how the information was determined in the first place.

Sometimes a gender may be recorded, but the reliability of the record is questionable or sources are contradictory. Throughout his life, 19th-century politician Murray Hall (Figure 4) presented male and registered a male name upon migrating to the US. When he died, his body was examined and his gender was posthumously legally determined to be female (Nelson, 2014, pp. 137–145). National media ridiculed him as a ‘passing woman,’ a term that carried an accusation of deception and trickery. In a case like this, we suggest users of the model settle on an approach that allows marking some statements as preferred over others. In the Wikibase context, so-called ranks may be used to emphasise or de-emphasise statements of the same property: The statement of recorded gender being female should be assigned a rank of
deprecated to highlight that
lived gender and
self-recorded gender are more credible sources than the official record. Colonial records should be treated in the same fashion, especially when dealing with terms employed by colonial administrations, missionaries or ethnographers. Records from Christian missionaries, for instance, do not ascribe Tł’a a nádleehi gender but rather that of ‘berdache,’ a derogatory colonial term (Naruszewicz, 2016, p. 12; Figure 3).

Modelling examples for Laxmi Narayan Tripathi and Yuu Watase. Controversial hijra rights activist Tripathi has identified as transgender as well as hijra, which she has described as a kind of transgender identity. Due to its flexibility, our model allows entering both hijra and transgender as her gender modality. Manga artist Watase self-identifies being X-gender, but there is no information available on their gender modality.

Modelling example for Public Universal Friend. Before the announcement, the Friend’s gender was recorded as female. However, the example is somehow exceptional: The Friend’s announcement is not retroactive in scope (i. e. the Friend has not always identified as genderless), but is valid from 1776 onward. The flexibility of the model allows us to add date qualifiers to the statement and thereby accurately record this special case.

Modelling example for Hastiin Tł’a. A qualifier specifies nádleehi as assumed gender and berdache as recorded gender. A Wikibase rank is used to mark the inaccurate and offensive nature of the colonial term. Whether it is appropriate to enter the term at all is subject to discussion. With no information on how Tł’a self-identified, it is impossible to determine gender modality.

Modelling example for Murray Hall. Our entry for Murray Hall identifies both female and male as values of gender. A qualifier specifies female to be a recorded gender, taking into account the fact that Hall was recorded as female upon death. The male value carries two gender qualifier values: lived and self-identified gender. This gives credit to the fact that census and naturalisation records contain information provided by Hall in person. Given the hostile ridicule that his dead body was subjected to, it is impossible to determine whether Hall was, in fact, trans, leaving gender modality undefined.


One advantage of this approach is that a SPARQL query for persons and the value of their
gender property returns valid and readable results. Using Wikibase, ranks may be applied to indicate the reliability of statements.Although we chose Wikibase as a suitable environment for this task, we wish to highlight that our approach may also be adapted to other systems. For instance, statements based on self-identification should be designated as preferred compared to statements that are based on records or assumptions. Queries would then, by default, only return the highest-ranked statements. Those wishing to go further in their analysis may explicitly query the gender qualifiers attached to a statement and receive the full range of nuances that the model is able to represent (Table 1).


Gender qualifier
Gender modality

Laxmi Narayan Tripathi
self-identified gender

Yuu Watase
self-identified gender

Public Universal Friend
self-identified gender

Public Universal Friend
recorded gender

Hastiin Tł'a
assumed gender

Hastiin Tł'a
recorded gender

Murray Hall
lived gender

Murray Hall
self-identified gender

Murray Hall
recorded gender

Table 1. SPARQL query results for all examples made thus far, including all gender statements and qualifiers. Sorted by order of appearance. Note the inclusion of deprecated gender statements, which would be withheld automatically when only querying for the gender property.

Closing remarks and prospects

In order to represent gender in the context of formalised data environments, we propose a modelling strategy that allows one to make explicit how the gender information was formulated in the first place. This enables researchers to make more informed choices. We hope that by distinguishing between different qualities of gendered information, our model can challenge essentialised notions and make cultural, geographical and temporal positions on gender explicit. It can thereby function as a tool that sparks discussion, uncertainty and scholarly self-reflection.


Amadiume, I. (2015).
Male Daughters, Female Husbands: Gender and Sex in an African Society. 2nd ed. (Critique Influence Change). London: Zed Books.

Arondekar, A. (2009).
For the Record: On Sexuality and the Colonial Archive in India. (Next Wave: New Directions in Women’s Studies). Durham: Duke University Press, doi:

Ashley, F. ‘Trans’ is my gender modality: A modest terminological proposal. In Erickson-Schroth, L. (ed),
Trans Bodies, Trans Selves. 2nd ed. Oxford University Press. [Preprint].

Betancourt, R. (2020).
Byzantine Intersectionality: Sexuality, Gender, and Race in the Middle Ages. Princeton, NJ: Princeton University Press, doi:

Butler, J. (2006).
Gender Trouble: Feminism and the Subversion of Identity. (Routledge Classics). New York: Routledge.

Cleves, R. H. (2014). Beyond the binaries in early America: Special issue introduction.
Early American Studies,
12(3). University of Pennsylvania Press: 459–68.

Ćosić, M., Dollinger, J., Isop, U. and Leibetseder, D. (2014). Gegenkulturelle Archive jenseits von Familie und Geschlecht. In Guggenheimer, J., Isop, U., Leibetseder, D. and Mertlitsch, K. (eds),
»When we were gender...« – Geschlechter erinnern und vergessen, vol. 5. Bielefeld: transcript Verlag, pp. 245–72 doi:

Deutsche Nationalbibliothek (2019a). GND Gender

Deutsche Nationalbibliothek (2019b). GND Ontology

Feinberg, L. (1996).
Transgender Warriors: Making History from Joan of Arc to RuPaul. Boston, MA: Beacon Press.

Flanders, J. and Jannidis, F. (2015). Knowledge organization and data modeling in the humanities.

Herdt, G. H. (ed). (1994).
Third Sex, Third Gender: Beyond Sexual Dimorphism in Culture and History. New York: Zone Books.

Hinchy, J. (2020).
Governing Gender and Sexuality in Colonial India: The Hijra, c. 1850–1900.

International Organization for Standardization (2004).
ISO/IEC 5218:2004.

Kravitz, M. (2021). Are pre-colonial genders inherently ‘nonbinary’ or ‘transgender’?
An Injustice!

Kronk, C. A. and Dexheimer, J. W. (2020). Development of the Gender, Sex, and Sexual Orientation ontology: Evaluation and workflow.
Journal of the American Medical Informatics Association,
27(7): 1110–15 doi:

Kronk, C. A., Everhart, A. R., Ashley, F., Thompson, H. M., Schall, T. E., Goetz, T. G., Hiatt, L., et al. (2021). Transgender data collection in the electronic health record: Current concepts and issues.
Journal of the American Medical Informatics Association doi:

Mauriello, M. (2019). Corpi dissonanti: note su gender variance e sessualità. Il caso dei femminielli napoletani.
Archivio antropologico mediterraneo. Dipartimento Culture e Società - Università di Palermo doi:

Mirandé, A. (2016). Hombres mujeres: An indigenous third gender.
Men and Masculinities,
19(4). Los Angeles, CA: SAGE Publications: 384–409 doi:

Moyer, P. B. (2015).
The Public Universal Friend: Jemima Wilkinson and Religious Enthusiasm in Revolutionary America. Ithaca, N.Y: Cornell University Press, doi:

Naruszewicz, C. J. (2016). Beyond binary: Navajo alternative genders throughout history.

Neil, C. and Garcia, J. (2009).
Philippine Gay Culture: Binabae to Bakla, Silahis to MSM. Hong Kong: Hong Kong University Press.

Nelson, L. (2014). Reanimating archiving/archival corporealities: Deploying ‘big ears’ in de rigueur mortis intervention.
QED: A Journal in GLBTQ Worldmaking,
1(2). Michigan State University Press: 132–59 doi:

Perreault, S. (2011).
VCard Format Specification. (Request for Comments). RFC Editor doi:

Puckett, J. A., Brown, N. C., Dunn, T., Mustanski, B. and Newcomb, M. E. (2020). Perspectives from transgender and gender diverse people on how to ask about gender.
LGBT Health,
7(6): 305–11 doi:

Roscoe, W. (1998).
Changing Ones: Third and Fourth Genders in Native North America. 1. ed. New York: StMartin’s Press. (2016). Why I chose to become a hijra: Laxmi in her own words Text

Slater, S. and Yarbrough, F. A. (eds). (2011).
Gender and Sexuality in Indigenous North America, 1400–1850. Columbia, S.C: University of South Carolina Press.

Text Encoding Initiative Consortium (2021). TEI element
P5: Guidelines for Electronic Text Encoding and Interchange

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO