The digital breadcrumb trail of Brothers Grimm

poster / demo / art installation
  1. 1. Greta Franzini

    Göttingen Centre for Digital Humanities - Georg-August-Universität Göttingen (University of Gottingen)

  2. 2. Emily Franzini

    Göttingen Centre for Digital Humanities - Georg-August-Universität Göttingen (University of Gottingen)

  3. 3. Gabriela Rotari

    Göttingen Centre for Digital Humanities - Georg-August-Universität Göttingen (University of Gottingen)

  4. 4. Franziska Pannach

    Göttingen Centre for Digital Humanities - Georg-August-Universität Göttingen (University of Gottingen)

  5. 5. Mahdi Solhdoust

    Göttingen Centre for Digital Humanities - Georg-August-Universität Göttingen (University of Gottingen)

  6. 6. Marco Büchler

    Göttingen Centre for Digital Humanities - Georg-August-Universität Göttingen (University of Gottingen)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Kinder- und Hausmärchen: Intratextuality and Intertextuality

Described as 'a great monument to European literature' (David and David, 1964; 180), Jacob and Wilhelm Grimm’s masterpiece
Kinder- und Hausmärchen (hereafter KHM) has captured adult and child imagination for 200 years. International cinema, literature and folklore have borrowed and adapted the Brothers’ fairy tales in multifarious ways, inspiring themes and characters in numerous cultures and languages. While commonly and erroneously considered the fathers of the genre, the fairy tales were not original to the Brothers. In fact, Jacob and Wilhelm collected and adapted their stories from earlier works, some of them dating back to as far as the seventh century BC, and made numerous changes to their own collection (David and David, 1964, p. 183), producing seven distinct editions between 1812 and 1857. In these four decades of writing and rewriting, the fairy tales changed in number, style and content in accordance with historical, social and literary influences. And yet, how did forty years of revisions not confuse the tradition and transmission amongst followers and readers? Indeed, some fairy tales were changed almost beyond recognition. What makes them so timeless and memorable? What is it that immortalises these tales? An answer to this question can be found in the
motifs the Brothers borrowed from earlier traditions and disseminated by way of their famous collection.

Motifs, defined by Prince's
Dictionary of Narratology (2003) as '[...] minimal thematic unit[s]', pervade the Grimm collection and are stable elements interlacing the seven editions of the KHM. The puss with the boots (
Der gestiefelte Kater) and the concept of the breadcrumb trail (originating in the
Hänsel und Gretel tale) are both examples of motifs, and they recur not only throughout the Grimm editions, but also over time and space. The occurrence and repetition of motifs within the Grimm collection is a form of
intratextuality, a term used to describe the internal relations within a text or an author and, in our case, within the KHM editions. But a motif may also appear in other authors across traditions and languages, thus creating
intertextual relations, relations that the KHM may have with other texts.

Related work and computational opportunities
The breadcrumb trail generated by these motifs in literary history, and internationally spread through the Brothers Grimm, has been extensively studied by folklorists, historians and literary critics. Akin to memes, motifs are a form of information transfer and reuse, which opens up numerous opportunities for computational research in cultural evolution and transmission. Interestingly, however, the study of motifs has not yet fully explored the affordances of digital methods. Many authoritative volumes and ontologies have been published in print, such as the well-known
Enzyklopädie des Märchens, or the
Estonian Folktales and the
Catalogue of Portuguese Folktales, but only a few digital projects or digital editions of these print sources exist. One such digital initiative is the
Aarne-Thompson's Motif-Index

The Aarne-Thompson Motif-Index can be accessed at: (accessed 18 October 2015).

, a crucial contribution to the field, often used as a reference system for the production of folktale catalogues.

The situation, however, is different for fairy tales, inasmuch as digital copies of many folktale collections are freely available from Google Books or the Internet Archive,

For example, the 1550-1553 Venetian collection
Le piacevoli notti by Giovanni Francesco Straparola, at: (accessed 21 October 2015).

or from online collections, such as the Nederlandse VolksverhalenBank initiative

Available at: (accessed 1 January 2016).
or the Satorbase project

Available at: (accessed 1 January 2016).
, fostering intertextual research never before possible.

The increasing availability of digital and digitised assets allows us to access information more easily and to potentially uncover previously unknown or unchartered territory.
Indeed, we can now leverage hyperlinks and APIs in order to automatically retrieve specific and previously inaccessible information across the web, and to connect existing resources for comparative studies. Moreover, no effort has yet addressed the cross-cultural relations of fairy tales, giving way to opportunities for interdisciplinary, multilingual and big data research.

Our project
The new project

Starting in October 2015 and running until December 2018.
described in this paper is one such opportunity, whereby an international and interdisciplinary team of computer scientists and humanists is semi-automatically crawling digitised texts and the web to produce a multilingual motif index that uses the Grimm collection as its base reference.

The team does not include but consults folklorists. We start with the Grimm collection as we already have clean data to work from.
More specifically, we combine knowledge acquired from existing print and digital resources with the deployment of the Google Search

Available at: (accessed 26 October 2015).
and Google Books APIs

Available at: (accessed 26 October 2015).
in order to automatically retrieve as many motifs across the web in as many languages as possible, and hence to explore the intratextual and intertextual relations that characterise the motifs' hosting texts. The end goal is twofold; on the one hand, we provide a comprehensive reference resource for scholars in the field and interested citizens alike and concurrently revise the Aarne-Thompson Index; on the other, by testing state-of-the-art text reuse and retrieval algorithms on a sample of these diverse and large datasets, we are able to refine our methods in order to accommodate further web-scale queries and thus sharpen our understanding of why and how motifs changed.

The case studies we are working with to address our research questions are three Brothers Grimm tales:
Snow White,
Puss in Boots and
The Fisherman and his Wife. These were chosen on the basis of their differing degree of popularity in order to better understand how transmission affects popularisation.

Our research starts with digital and clean copies of the Grimm texts, downloaded and catalogued from TextGrid

Available at: (accessed 26 October 2015).
and Wikisource

Available at: (accessed 26 October 2015).
. Next, our international team of researchers and student assistants collects digitally available translations and/or editions of the three tales in multiple languages

eTRAP is currently a team of twelve people from seven nationalities speaking eleven different languages.
and manually enters them into a database, where information about the web source, the tales, the language, the work and the author is stored.

An example may be of use in clarifying the point. Grimm's
Snow White corresponds to Pushkin's Сказка о Мертвой Царевне и о Семи Богатырях (
The Tale of the Dead Princess and the Seven Knights in English). The two tales differ in many points, including the title of the tale. In Pushkin the princess is protected by seven
knights (семь богатырей) whereas in the Grimm tale it is seven
dwarves. Despite the differences, the motifs of the beautiful princess and of her seven protectors link the two stories. To hyperlink and map these versions and their differences, we use a combination of Thompson identifiers for tales, VIAF identifiers for authors and works, and customised identifiers where existing ones do not apply. This semi-automatic approach allows us to populate our database with both content and metadata, and establish relations between the different versions.

Once this manually-compiled dataset is complete, we deploy the TRACER suite of text reuse algorithms (Büchler, 2013) to trace additional motifs in other digital libraries or corpora. At the same time, we use the Google Search and the Google Books APIs to search for motifs at a much larger scale, effectively crawling the web.

Like the KHM, we believe this project appeals to a wide and diverse audience not only because of its subject matter, but also because of its international and interdisciplinary character. Our international group operates at the intersection of Computer Science and the Humanities in the arena that is Digital Humanities. This project is unique insofar as each and every member of the team can contribute a piece of his or her own culture, adding a personal and familiar touch to this joint endeavour. By exploring these different cultures, we aim to establish fruitful collaborations and, in so doing, broaden the boundaries of the Digital Humanities.
Furthermore, we believe that this project fully engages humanists in the digital process of tracing texts through space and time. Following the motif trail back in time allows humanists to investigate lines of transmission of folktales and to potentially uncover additional trails through which other documents or stories travelled. At the same time, it enables the computer scientists in the team to identify any shortcomings in our algorithms and to better understand what to automatically feature when tracing this type of information in a digital ecosystem.

Büchler, M. (2013).
Informationstechnische Aspekte des Historical Text Re-use. Ph.D thesis, University of Leipzig.

David, A. and David, M. E. (1964). A Literary Approach to the Brothers Grimm.
Journal of the Folklore Institute, 1(3): 180-96. (accessed 26 July 2015).

Prince, G. (2003).
Dictionary of Narratology. Revised Edition. University of Nebraska Press, Lincoln and London.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website:

Series: ADHO (11)

Organizers: ADHO