Computational Analysis of Gender and the Body in European Fairy Tales

  1. 1. Scott B. Weingart

    School of Library & Information Science - Indiana University, Bloomington

  2. 2. Jeana Jorgensen

    Indiana University, Bloomington

Work text
This paper presents preliminary results on using computational analysis to understand the representations and constructions of gender and the body in fairy tales. While scholarship on contemporary fairy tales utilizes various cutting-edge theories, ranging from postmodern narrative to feminist theories of gender performance (Bacchilega 1997, Benson 2008, Smith 2007, and Tiffin 2009), little of the research on canonical fairy tales or oral folktales incorporates these recent theories. Additionally, folkloristic research on fairy tales, whether contemporary or traditional, would benefit from incorporating computational methods such as network analysis. These methods allow scholars to test their theories more quickly and empirically.

Our research utilizes nearly three hundred canonical fairy tales and oral folktales, deemed canonical because they are from well-known collectors such as the Brothers Grimm, or because the tales are examples of well-known plots spanning time and space in Europe (such as “Snow White” and “Cinderella”). We combine textual and network analysis with discipline-specific expert oversight for a large-scale, theoretically informed discussion on gender and the body that would not be possible without both in tandem. A feminist critique of fairy tales is predicated upon the notion that fairy tales construct and represent bodies differently according to gender, yet no studies have shown whether this difference actually exists in canonical tales, or have addressed what this difference would mean for studies of cultural values and narrative strategies (Bottigheimer 1987, Haase 2004, and Stone 2008). Computational analysis of how bodies and body parts are depicted in the text provides empirical evidence against which this and other aspects of feminist theory can be tested.

Humanities scholars have already established a vast theoretical and methodological framework for interpreting texts, and they ought to be able to view their data in the context of those theories developed within their disciplines. This study combines traditional critical analysis with computational tools in an attempt to utilize the best of both worlds.

Our analysis uses a hand-coded database representing a geographically and temporally diverse sample of tales. Careful attention was paid to the tale tellers and collectors for further study of the context in which bodies are depicted.

Fairy tales as a genre span oral, communal performances and literary, single-author renditions. In order to represent this spectrum, our database tracks specific references to bodies in six tale collections. We collected 13 data points from nearly three hundred tales (Tale, Collection, Author, Teller, Collector, Year of Writing/Collecting, Year of Publication, Tale Type, Region, Original Language, Gender of Teller/Writer, Gender of Collector, Gender of Editor) and categorized another 14 data points for every mention of a body in each tale, some evident in the texts (Noun, Adjective, Surrounding Text, Page Number, Gender, Young/Old, High/Low, Quoted Speech, Skin Tone) and some requiring interpretation (Positive/Negative value, Grotesque, Violence, Nudity, Move).

The database variables were chosen in light of pre-existing work on structure and theory, creating a layer of interpreted data that would not be found in full-text analysis alone. The “Tale Type” classification system gives tale plots numbers so that their transmission can be traced as tales migrate across linguistic and national boundaries. This is what allows us to generalize about the worldview contained within the tales, as the same plot with variations occurs between multiple ethnic groups. The concept of “Moves” breaks up tale plots into 5 distinctive plot pieces based on folkloristic theory of how tales are structured.

We use co-occurrence and vector analysis to explore the database. Each field is compared against several others in order to find correlations. For example, “beautiful” may only be referenced with young women, or old rich men may only appear in tales from certain tellers. Using dimensionality reduction, we can find which body parts tend to be discussed in tandem in various situations. We also explore how the representations of bodies change throughout the plots of tales using Bengt Holbek’s “Moves.” Holbek built on the work of Vladimir Propp, who identified the most important plot points in sequence that could occur in a fairy tale (31 points, or “functions,” total) (Holbek 1998). Holbek condensed Propp’s functions into five “Moves,” or clusters of thematic actions that move the tale’s plot forward.

Finally, networks of database data are generated and analyzed to test the hypothesis that fairy tales construct bodies differently according to gender. This analysis serves both as empirical evidence to test a theory and also as an exploratory tool, revealing possible correlations or links between body representations that are not immediately apparent in the texts.

Folklorists approach fairy tale interpretation in many ways: ethnographic approaches seek connections between the taletellers’ lives and the tales’ content; historical approaches search for and analyze the origins and diffusion of tales; structural analyses seek to understand the underlying narrative of the tales; psychological approaches search for latent meaning in the tales; and feminist and sociohistorical approaches interpret the meanings of the tales as they relate to, convey values from, and inculcate values of the social world. Feminist scholars have been particularly active in critiquing the normative beauty scripts and gender roles promoted in fairy tales. This study investigates how gender roles are constructed and situated in fairy tales, which is why we encoded categories to investigate links between gender, age, and social position, as well as where in the tale’s structure these social values are relevant. We also hope to obtain information about how female and male bodies are valued differently, hence the relevance of variables like “grotesque”.

Preliminary Results
Second-wave feminists such as Simone Beauvoir developed the notion of the universal masculine perspective, the idea that in Western culture, the public, unmarked, assumed universal position is in fact specifically male. Our data supports this assertion in terms of female bodies being marked within fairy tales, but we also believe that the same principle applies to the age of bodies. Youthful bodies are assumed to be the unmarked universal category in fairy tales.

The descriptions of old bodies are strikingly polarized: old people are more likely than young to be described as evil or good, wicked or wise. These findings suggest that old bodies must be differentiated in fairy tales, because they are no longer in the supposedly universal category of youth. Old bodies are qualified with more descriptions in order to give audiences a sense of who these characters are, since they don’t fall into the category of the youthful protagonist, with whom listeners are supposed to easily identify. If the bodies in fairy tales had the same meanings across age and gender, we would have seen a proportional relationship between the number of references to types of bodies, and the number of adjectival descriptions attached to each. However, the data shows that young men are associated with adjectival descriptions less frequently than any other type of body. Old women, in contrast, are associated with adjectival descriptions more than any other grouping. Further, a wider variety of adjectives are used to qualify old bodies than young compared to the proportion they are mentioned. Figure 1 shows a sample of which adjectives are associated with mentions of various bodies.

Figure 1. Lines are drawn between adjectives (red) and the body-types they modify (yellow). Node size represents word usage counts and edge thickness represents frequency of co-occurrence.

Full Size Image

Our method of layering computational analysis atop previous theoretical research can be used as a template for further studies, especially those of other folk narrative genres like legends or ballads. Some of the most intriguing questions in folklore research pertain to how verbal expressive genres relate to the lived experiences of their performers—and a method for digitizing and interpreting these texts could yield valuable insights.

As digitization is interpretation (Tarte 2010), it is necessary to be especially careful and theoretically-grounded when choosing variables and selecting exactly what data will populate the fields. The scholar must also decide the most fruitful analyses to run on the data available. These studies ought to also include computational analyses that are not linked to previous critical theories (like word frequency or co-occurrence), however, in order to check against biases which might creep into variable choice. The ultimate goal is to turn well-researched, theoretically sound scholarly observations into machine actionable data which can be analyzed to test the scholar’s hypotheses and open the door for future studies.

