Analysis and Exploration of Supernatural Fanfictions from the Platform Archive of Our Own

poster / demo / art installation
  1. 1. Nina Kleindienst

    Media Informatics Group / Lehrstuhl für Medieninformatik - Universität Regensburg (University of Regensburg)

  2. 2. Thomas Schmidt

    Media Informatics Group / Lehrstuhl für Medieninformatik - Universität Regensburg (University of Regensburg)

  3. 3. Christian Wolff

    Media Informatics Group / Lehrstuhl für Medieninformatik - Universität Regensburg (University of Regensburg)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Fanfictions are fan-created written works using already existing characters and plot elements of existing famous media to write new stories based on those characters (Dym et al. 2018). This genre of online literature has gained a lot of interest in literary studies and the humanities (cf. Hellekson and Busse, 2006; Thomas, 2011; Van Steenhuyse, 2011; Jamison, 2013) but also in digital humanities (DH) in recent years (Milli & Bamman, 2016; Fast et al. 2016; Yin et al. 2017; Frens et al. 2018; Rebora and Pianzola, 2018; Pianzola et al. 2020; Schmidt et al. 2021d).
We present first results of the analysis of around 15 years of fanfiction production for the fandom Supernatural and fanfictions from the platform Archive of Our Own (AO3)

. Supernatural is a popular mystery-fantasy American TV show running from 2005 to 2020 and the fan fiction community is regarded as one of the most productive which has led to research in DH (Kleindienst and Schmidt, 2020). AO3 is one of the most popular fan fiction websites and hosts, to our knowledge, the largest number of Supernatural fan fictions compared to other platforms.

We developed a script to scrape all Supernatural fanfictions from AO3 which we used in January 2021 to gather all fan fictions available at that moment.

Due to legal constraints, the data is only available via request (
We scraped the entire HTML page and transformed the content into structured JSON including the text of the stories and metadata. The general corpus analysis was performed similar to research on social media (e.g. Moßburger et al., 2020) Overall, the corpus consists of 170,436 unique fan fictions and over 1 billion tokens. On average a fanfiction has a length of around 6,000 tokens but this length varies between 100 to 2 million tokens (table 1).

Table 1. General corpus statistics.

In the following section, we present a small subset of first corpus and metadata results. We focus on diachronic analysis across the respective time spans of the seasons of the show from 2005-2020. Each season basically represents one year. We regard fanfictions as part of a season if the publication date is during the airing of this seasons or before the airing of the next season.
As figure 1 shows, the most significant increase in production begins in season 4 and peaks in season 10. Indeed, season 4 sees the appearance of the character “Castiel” which is of great importance for the community.

Figure 1. Fanfiction production across seasons.
We also compared the production of the airing time of a season to the in-between time. As figure 2 shows, the production takes a significant decrease in between seasons. This is in line with research by De Kosnik et al. (2015) who have found that fanfiction production tends to happen immediately after release.

Figure 2. Fanfiction production across and in-between seasons.
The average length of the individual fanfiction increases throughout the airing of the show peaking with almost 8,000 tokens in the more recent seasons (figure 3).

Figure 3. Average number of tokens per fanfiction.
Considering the number of different authors, we found that up until season 6 this number stays rather small below a limited core of 1,000 authors but increases drastically, again, in season 4 and season 7 (figure 4).

Figure 4. Number of authors per season.
One important factor of fanfictions is the relationship type of the characters the fanfiction depicts. Indeed, as the overall distribution of this AO3 metadata shows, the majority (55.4%) of all fanfictions of this corpus deals with male-male homo-erotic and romantic content (slash). This is a well-known phenomenon of fanfictions (cf. Hellekson & Busse, 2006) and an important part of Supernatural fanfictions.

Table 2. Distribution of relationship type tags for the entire corpus.
Again, the dominance of this slash fanfictions (M/M) becomes striking over time beginning in season 4 as figure 5 shows.

Figure 5. Proportion of relationship type tags across seasons.
Fanfiction authors can explicitly add the specific character relationship as metadata which is also curated by AO3. Looking at the proportion of the three most popular relationships (figure 6), we identified that (1) all of them are of M/M-nature and (2) the rise in popularity of the show and the fanfictions goes hand in hand with the introduction of the character Castiel and the imaginations about the relationship with the main character Dean Winchester. This is especially striking since Castiel is a mere side character in season 4 that became part of the main cast due to his popularity that can also be seen in our data.

Figure 6. Proportion of the three most popular relationships across seasons.
Please note, that we only presented a subset of the results this corpus has to offer. Furthermore, we also see great potential form more advanced methods that have gained popularity in recent years in DH like sentiment analysis (Schmidt and Burghardt, 2018; Schmidt et al., 2021a; Schmidt et al., 2021b) or even multimodal approaches including the video channel of the TV show in further studies (similar to Schmidt et al., 2019; Schmidt et al., 2020a; Schmidt et al., 2020b; Schmidt et al., 2021c; Schmidt and Wolff, 2021).


De Kosnik, A., El Ghaoui, L., Cuntz-Leng, V., Godbehere, A., Horbinski, A., Hutz, A., Pastel, R. and Pham, V. (2015). Watching, creating, and archiving: Observations on the quantity and temporality of fannish productivity in online fan fiction archives.
21(1). SAGE Publications Ltd: 145–64 doi:

Dym, B., Aragon, C., Bullard, J., Davis, R. and Fiesler, C. (2018). Online Fandom: Boldly Going Where Few CSCW Researchers Have Gone Before.
Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing. Jersey City NJ USA: ACM, pp. 121–24 doi:
10.1145/3272973.3274542. (accessed 6 October 2020).

Fast, E., Vachovsky, T. and Bernstein, M. S. (2016). Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community.

Hellekson, K. and Busse, K. (2006).
Fan Fiction and Fan Communities in the Age of the Internet: New Essays. McFarland.

Jamison, A. (2013).
Fic: Why Fanfiction Is Taking Over the World. Illustrated Auflage. Dallas, Texas: Smart Pop.

Kleindienst, N. and Schmidt, T. (2020). Investigating the Transformation of Original Work by the Online Fan Fiction Community: A Case Study for Supernatural. Basel, Switzerland (accessed 21 April 2022).

Milli, S. and Bamman, D. (2016). Beyond Canonical Texts: A Computational Analysis of Fanfiction.
Proceedings of the 2016 Conference on Empirical Methods in Natural          Language Processing. Austin, Texas: Association for Computational Linguistics, pp. 2048–53 doi:
10.18653/v1/D16-1218. (accessed 6 October 2020).

Moßburger, L., Wende, F., Brinkmann, K. and Schmidt, T. (2020). Exploring Online Depression Forums via Text Mining: A Comparison of Reddit and a Curated Online Forum.
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task. Barcelona, Spain (Online): Association for Computational Linguistics, pp. 70–81 (accessed 21 April 2022).

Pianzola, F., Rebora, S. and Lauer, G. (2020). Wattpad as a resource for literary studies. Quantitative and qualitative examples of the importance of digital social reading and readers’ comments in the margins. (Ed.) Orrego-Carmona, D.
15(1): e0226708 doi:

Rebora, S. and Pianzola, F. (2018). A New Research Programme for Reading Research: Analysing Comments in the Margins on Wattpad.
DigitCult | Scientific Journal on Digital Cultures(3.2): 19–36 doi:

Schmidt, T. and Burghardt, M. (2018). An Evaluation of Lexicon-based Sentiment Analysis Techniques for the Plays of Gotthold Ephraim Lessing.
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Santa Fe, New Mexico: Association for Computational Linguistics, pp. 139–49 (accessed 6 April 2020).

Schmidt, T., Burghardt, M. and Wolff, C. (2019). Toward Multimodal Sentiment Analysis of Historic Plays: A Case Study with Text and Audio for Lessing’s Emilia Galotti. In Navarretta, C., Agirrezabal, M. and Maegaard, B. (eds),
Proceedings of the Digital Humanities in the Nordic Countries 4th Conference (DHN 2019). Copenhagen, Denmark, pp. 405–14 (accessed 21 April 2022).

Schmidt, T., Dangel, J. and Wolff, C. (2021a). SentText: A Tool for Lexicon-based Sentiment Analysis in Digital Humanities. vol. 74. Glückstadt: Werner Hülsbusch, pp. 156–72 (accessed 21 April 2022).

Schmidt, T., Dennerlein, K. and Wolff, C. (2021b). Emotion Classification in German Plays with Transformer-based Language Models Pretrained on Historical and Contemporary Language.
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Punta Cana, Dominican Republic (online): Association for Computational Linguistics, pp. 67–79 doi:
10.18653/v1/2021.latechclfl-1.8. (accessed 21 April 2022).

Schmidt, T., El-Keilany, A., Eger, J. and Kurek, S. (2021c). Exploring Computer Vision for Film Analysis: A Case Study for Five Canonical Movies.
2nd International Conference of the European Association for Digital Humanities (EADH 2021). Krasnoyarsk, Russia​Exploring_​Computer_​Vision_​for_​Film_​Analysis_​A_​Case_​Study_​for_​Five_​Canonical_​Movies (accessed 21 April 2022).

Schmidt, T., Engl, I., Halbhuber, D. and Wolff, C. (2020a). Comparing Live Sentiment Annotation of Movies via Arduino and a Slider with Textual Annotation of Subtitles. In Reinsone, S., Skadiņa, I., Daugavietis, J. and Baklāne, A. (eds),
Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), vol. 2865. Riga, Latvia: CEUR Workshop Proceedings, pp. 212–23 (accessed 21 April 2022).

Schmidt, T., Grünler, J., Schönwerth, N. and Wolff, C. (2021d). Towards the Analysis of Fan Fictions in German Language: Exploration of a Corpus from the Platform Archive of Our Own. Krasnoyarsk, Russia​Towards_​the_​Analysis_​of_​Fan_​Fictions_​in_​German_​Language_​Exploration_​of_​a_​Corpus_​from_​the_​Platform_​Archive_​of_​Our_​Own (accessed 21 April 2022).

Schmidt, T., Mosiienko, A., Faber, R., Herzog, J. and Wolff, C. (2020b). Utilizing HTML-analysis and computer vision on a corpus of website screenshots to investigate design developments on the web.
Proceedings of the Association for Information Science and Technology,
57(1): e392 doi:

Schmidt, T. and Wolff, C. (2021). Exploring Multimodal Sentiment Analysis in Plays: A Case Study for a Theater Recording of Emilia Galotti.
Proceedings of the Conference on Computational Humanities Research 2021 (CHR 2021). Amsterdam, The Netherlands, pp. 392–404.

Thomas, B. (2011). What Is Fanfiction and Why Are People Saying Such Nice Things about It?.
Storyworlds: A Journal of Narrative Studies,
3 doi:

Van Steenhuyse, V. (2011). The Writing and Reading of Fan Fiction and Transformation Theory.
CLCWeb: Comparative Literature and Culture,
13(4) doi:
10.7771/1481-4374.1691. (accessed 7 October 2020).

Yin, K., Aragon, C., Evans, S. and Davis, K. (2017). Where No One Has Gone Before: A Meta-Dataset of the World’s Largest Fanfiction Repository.
Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Denver Colorado USA: ACM, pp. 6106–10 doi:
10.1145/3025453.3025720. (accessed 6 October 2020).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO