Arizona State University
This study examines a “pulp” science fiction corpus (1930–1965) through corpus linguistic analysis in order to digitally reconstruct the gendered occupational identities created by those authors, and the culture they represent, which perpetuated a stereotype of “the scientist” and how they characterized women in professional scientific roles. I created “occupational archetypes” based on the linguistic analysis of collocates, clusters, and textual examples of science, technology, engineering, and math (STEM) career keywords in order to investigate the culturally informed gender roles demonstrated in modern stereotypes of the scientist. I chose to study pulp science fiction, a sub-genre of science fiction literature that enjoyed a wide readership during the formative decades of the creation of the scientific industrial complex in pre and post-war America. One way to get at the culture that created and then maintained our national scientific industrial complex is through examining the stories about science that people of that time produced. For indeed, stories, and even more simply, language are a transmitter of social and cultural values, especially when it comes to gender roles (Rey, 2001). Pulp science fiction existed as a sub-genre of science fiction from roughly 1930 to 1965, characterized by its wide audience and affordability. The accessibility and engaging style of this literary genre gained it a wide readership, and “the pulps” as they came to be called quickly became a feature of American life during the pre and post-war eras. These stories, and indeed the genre at large, represent popular conceptions of “appropriate” gender identities and reinforce those occupational stereotypes that play such a key role in the lives of women scientists.
The aim of corpus linguistics is to study patterns of language at their most fundamental level of words and phrases, thereby revealing patterns of meaning, making the implicit explicit (Biber, 1998; Biber, 2009; Stubbs, 2001; McEnery, 2001; Lakoff, 2008; Kennedy, 2014; Hettel, 2013). Since patterns of meaning are precisely what I wished to investigate with respect to gendered occupational stereotypes, this method served as the basis for my study. Corpus linguistics is able to harness the power of Moretti's distant reading approach (Moretti, 2013) in uncovering the scope and and nature of the literature, while also providing clues as to which specific pieces within a corpus merit a close reading.
The corpus I constructed for this project consists of 560 full text copies of pulp science fiction stories from 1930 to 1965 (totaling just over 6 million words), published in magazines like Astounding Stories, Amazing Stories, Analog Science Fact and Fiction, Planet Stories, and If Worlds of Science Fiction. These full texts were obtained from public repositories, principally Project Gutenberg (https://www.gutenberg.org/) and The Internet Archive's Pulp Magazine Archive (https://archive.org/details/pulpmagazinearchive). While some of these stories were already conveniently in plain text files, others were scanned copies of the original pulp magazine pages stored as image files. The latter I converted into plain text through the application of Tesseract, an open source optical character recognition (OCR) program. I then organized these stories according to their date of publication in the magazines, stratifying according to five year periods: 1930-34, 1935-39, etc. Each of these five year periods contain 80 stories, coming to 560 in total. This stratification allows for representativeness through ensuring that all five year periods were weighted proportionally over time (Sinclair, 2004).
When I finished constructing the corpus, I used the software suite WordSmith Tools to generate keyword lists for the corpus in its entirety, in addition to each five year period respectively. When I generated the keyword lists for each five year period, I used the rest of the pulp science fiction corpus as my reference, in order to track how these words were being used over time (Bondi, 2010). From these general keyword lists, I chose the keywords which represented careers or occupations that constitute or interact with STEM disciplines: scientist, engineer, mathematician, doctor, nurse, and professor. I then used WordSmith Tools to analyze measures of association for the above science, technology, and engineering occupational words. By focusing on the language used to describe occupations related to the sciences, I was able to get a picture of the characterization of these professions at the time the stories were published. I also did a collocation analysis in order to uncover the words that most frequently co-occurred with these keywords, limited to five words to the right and left of the key word in question (the node). The character of the collocates reflects the nature of the node, and the distinctions offered by collocations are subtle, yet crucial to the creation of a linguistic profile (Hettel, 2013).
In order to determine which collocates were statistically significant, I evaluated the association by its t-score, a statistic which works well with smaller corpora (such as mine) because it also takes frequencies into account, as opposed to mutual information (MI). Using the STEM occupation keywords, collocates, clusters, and qualitative examination of specific examples of the node words in context, I then created “lexical profiles” of each of these science, technology, engineering, and related occupations, which I'm terming “occupational archetypes.” The development of these archetypes is based largely on the work done by Hettel on the construction of lexical profiles from collocations, clusters, and context in the language of US nuclear plants and regulatory entities.
Specifically, the archetype I constructed of “the scientist” revealed a middle aged white male, defined by his adherence to “true” or “good” science, and often called upon by other characters to provide scientific or technical insight. Though he spends a good deal of his time talking, others struggle to grasp his meaning and find him difficult to deal with. This occupational archetype is a mirror image of the American stereotype of scientific professionals, one which leaves no room for diversity in race or gender. Furthermore, through this analysis, I discovered that out of the hundreds of scientists in my corpus, only three were women. A linguistic analysis of these women in particular (a chemist, a physicist, and a mathematician) revealed American cultural assumptions about the intersection of femininity and science: a female scientist could either be beautiful or accomplished. And even then, the chemist's beauty came with exploitation (and the physicist's ugliness with prestige), and scientific genius in these women necessitated qualification (i.e. genius “in her own way”) while that of their male colleagues did not. The mathematician, the one woman in the corpus with beauty and brains, so to speak, appeared very late on the scene, and perhaps signals a shift in the cultural conception of who a scientist could be and what they could look like. Making these entrenched cultural stereotypes of women in science explicit through linguistic analysis is the first step in creating a STEM workforce strong through its diversity and acceptance.
Bibliography
Biber, D., Conrad, S. and Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press.
Biber, D. and Conrad, S. (2009). Register, genre, and style. Cambridge University Press.
Bondi, M. and Scott, M. (eds.) (2010). Keyness in texts. John Benjamins Publishing, vol. 41.
Gries, S. T. (2010). Useful statistics for corpus linguistics. A mosaic of corpus linguistics: selected approaches, pp. 269-91.
Hettel, J. M. (2013). Harnessing the power of context.
Kennedy, G. (2014). An introduction to corpus linguistics. Routledge.
Lakoff, G. and Johnson, M. (2008). Metaphors we live by. University of Chicago press.
McEnery, T. and Wilson, A. (2001). Corpus linguistics: An introduction. Edinburgh University Press.
Moretti, F. (2013). Distant reading. Verso Books.
Oakes, Michael P. (2010). Statistics for Corpus Linguistics. Edinburgh University Press. 1998.
Rey, J. M. (2001). Changing gender roles in popular culture: Dialogue in Star Trek episodes from 1966-1993. Variation in English: Multidimensional Studies. London: Longman. pp. 13856.
Sinclair, J. (2004). Developing linguistic corpora: a guide to good practice. Corpus and text–basic principles.
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell Publishers.
Tognini-Bonelli, E. (2001). Corpus linguistics at work,John Benjamins Publishing, vol. 6.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Jagiellonian University, Pedagogical University of Krakow
Kraków, Poland
July 11, 2016 - July 16, 2016
454 works by 1072 authors indexed
Conference website: https://dh2016.adho.org/
Series: ADHO (11)
Organizers: ADHO