“Es war einmal …” – First Sentences in Literature: A German-Language Reference Corpus:

poster / demo / art installation
  1. 1. Anna Busch

    Theodor-Fontane-Archiv, Universität Potsdam

  2. 2. Torsten Roeder

    Bergische Universität Wuppertal

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In literary and linguistic studies, the first sentence of a narratological context is a regularly studied object (on this, among others, Alt 2020, Haubrichs 1995, Hirdt 1974, Queng 2019, Miller 1965, Neuhaus 2019, Raulff 2019, Retsch 2000, Selbmann 2019). This is hardly surprising, since the first sentence has been regarded since Wolfgang Iser's study The Act of Reading as the entrance into the text through reading, as the key point of interaction between text and reader (1976: 38). In the richness of its various forms, the first sentence reveals “the treasures of literature in nuce” (Alt 2020: 18) and, with Alain Robbe-Grillet, it could be put forward that literary history is to be written from the study of its opening sentences (1992: 38).
A systematic, digitally supported study of “first sentences” has yet to be carried out. Occasionally, corpora of first sentences in German have been collected by hand (Beck 1992, Beck 1993, Wolkersdorf 1994) and attempts have been made to draw up a typology of the first sentence in literature on the basis of selected individual analyses (most recently Alt 2020). A systematic categorisation on the basis of a semi-automated, larger corpus of research – as presented here – seems helpful. There are similar studies that inquire into the quintessence of the poetic in literature through its countability (cf. for example Moretti 2009, also Fischer/Strötgen 2015, Fischer/Jäschke 2018a/b); a single quantifying study dealing decidedly with German-language narrative beginnings (not first sentences) can be found in the work of Herrmann 2018.
The aim of the corpus “First Sentences in German-Language Literature” is to address the “lack of an overall view” (Alt 2020: 246) of all previous studies on first sentences. To this end, a data corpus is created and published, on which an initial evaluation will be undertaken in an interlocking of quantitative and text-analytical approaches.

Several full-text, open access corpora (
Deutsches Textarchiv,
Zeno, etc.), from which texts were extracted according to genre, serve as source material. It is clear that although the existing full-text offerings provide varying degrees of structural information about the respective document, the automatic delimitation of closed text units is often non-trivial and not possible reliably without individual examination (e.g. in the case of anthologies, texts with several chapters, texts in several volumes). However, this is the prerequisite for extracting the first sentences. In addition, the beginning of the "poetic text" cannot always be clearly localised automatically, e.g. due to prefaces, dedication texts or introductions.

Furthermore, the delimitation of “first sentences” is a semantic problem. Sentences can be understood as grammatical-analytical units that are separated from each other by certain punctuation marks, which accommodates machine processing. However, the signs used to delimit a sentence differ and change considerably. The absolute selectivity of some punctuation marks is also questionable depending on the context, which is why sentences are sometimes to be understood as units of meaning in which punctuation marks have a structuring but not interrupting function (cf. fig. 2a/b). Should we therefore rather speak of a flowing “beginning” or “start”? Thus, areas of vagueness play into the determination of "first sentences", which in turn can affect corpus consistency and comparability.

The currently created corpuses of novels, novellas and fairytales is completely encoded in TEI, including metadata and source information, including positional information (available at
). Depending on the genre, the number of first sentences ranges between 100 and 1,000 entries. With the help of the manually and automatically created annotations, the corpus can be analysed and visualised according to various parameters, such as date of publication, text genre, gender of author, references to persons, places or time in the text (cf. Fig. 1c) or length of the entire text. In addition, it is documented which selection criteria the respective data sources were subject to and how this should be taken into account in the evaluation with regard to the balance of the corpus (cf. Hug/Boenig 2021). To disseminate the corpus, the Twitter project
@satzomat was launched in 2021, which sends two first sentences daily (cf. Figures 1–3).

The aim is to create a “typology of incipits” with the help of computer-philological evaluation methods and to ask to what extent genres determined certain types of first sentences in the course of history (e.g. landscape image, frame story) and whether further correlations can be determined with the help of the metadata and annotations (see the project page
for more information).


Fig. 1a/b/c: Novella beginnings

Fig. 2a/b/c: Novel beginnings

Fig. 3a/b/c: Fairy tale beginnings


Alt, Peter-André (2020):
‘Jemand musste Josef K. verleumdet haben …’ Erste Sätze der Weltliteratur und was sie uns verraten. München: Beck.

Beck, Harald (1992)
Roman-Anfänge. Rund 500 erste Sätze. Zürich: Haffmans.

Beck, Harald (1993)
Romanenden. Rund 500 letzte Sätze. Zürich: Haffmans.

Fischer, Frank / Strötgen, Jannik (2015): “Wann findet die deutsche Literatur statt? – Zur Untersuchung von Zeitausdrücken in großen Korpora.” Presented at the DHd2015
Von Daten zu Erkenntnissen: Digitale Geisteswissenschaften als Mittler zwischen Information und Interpretation. 2. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd2015), Graz: Zenodo.
[last access: 9. December 2021]

Fischer, Frank / Jäschke, Robert (2018a): “Liebe und Tod in der Deutschen Nationalbibliothek. Der DNB-Katalog als Forschungsobjekt der digitalen Literaturwissenschaft.” Presented at the DHd 2018 Kritik der digitalen Vernunft. 5. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" (DHd 2018), Köln: Zenodo.
[last access: 9. December 2021]

Fischer, Frank / Jäschke, Robert (2018b): “Ein Quantum Literatur. Empirische Daten zu einer Theorie des literarischen Textumfangs.” DFG-Symposium “Digitale Literaturwissenschaft”. Villa Vigoni, 9.–13. Oktober 2017. [unpublished]

Haubrichs, Wolfgang (1995): “Kleine Bibliographie zu “Anfang” und “Ende” in narrativen Texten (seit 1965)”, in: Zeitschrift für Literaturwissenschaft und Linguistik 25, 99: 36-50.

Herrmann, Berenike (2018): “Anschaulichkeit messen. Eine quantitative Metaphernanalyse an deutschsprachigen Erzählanfängen zwischen 1880 und 1926”, in: Köppe, Tilmann / Singer, Rüdiger (eds.):
Show, don’t tell: Konzepte und Strategien anschaulichen Erzählens. Bielefeld: Aisthesis 167-212.

Hirdt, Willi (1974): “Incipit. Zu einer Poetik des Romananfangs”, in: Romanische Forschungen LXXXVI: 419-436.

Hug, Marius / Boenig, Matthias (2021): Die Geschichte der Digitalen Bibliothek, oder: Aller guten Kurationen sind drei+:
[last access: 9. December 2021]

Iser, Wolfgang (1976): Der Akt des Lesens. Theorie ästhetischer Wirkung. München: Fink.

Miller, Norbert (1965):
Romananfänge. Versuch zu einer Poetik des Romans. Berlin: Verl. Literarisches Colloquium.

Moretti, Franco (2009): Style, Inc Reflections on Seven Thousand Titles (British Novels, 1740-1850), in:
Critical Inquiry 36, I: 134-158.

Neuhaus, Stefan (2019): “Aber wehe, wehe, wehe! Wenn ich auf das Ende sehe!!” Wie in Romanen und Erzählungen durch Anfang und Ende ein Rahmen erzeugt wird, in: Neuhaus, Stefan / Weber, Petra (eds.):
Anfangen und Aufhören. Paderborn: Wilhelm Fink 141-157.

Queng, Jesse (2019): “Syntaktische Strukturen als poetologisches Mittel des Anfangens in der Prosa: Der erste Satz von Heinrich Bölls Irischem Tagebuch”, in: Neuhaus, Stefan / Weber, Petra (eds.):
Anfangen und Aufhören. Paderborn: Wilhelm Fink 89-101.

Raulff, Ulrich (2019): “Letzte Sätze”, in:
Zeitschrift für Ideengeschichte 13: 129-142.

Retsch, Annette (2000):
Paratext und Textanfang. Würzburg: Königshausen & Neumann.

Richardson, Brian (2008):
Narrative Beginnings: Theories and Practices. University of Nebraska Press.

Robbe-Grillet, Alain (1992): “Warum und für wen schreibe ich”, in: Bühler, Karl Alfred (ed.):
Robbe-Grillet zwischen Moderne und Postmoderne - "nouveau roman", "nouveau cinéma" und "nouvelle autobiographie". Tübingen: Narr.

Selbmann, Rolf (2019): “Lauter erste Sätze”, in: Neuhaus, Stefan / Weber, Petra (eds.):
Anfangen und Aufhören. Paderborn: Wilhelm Fink 67-87.

Wolkersdorfer, Andreas (1994):
Der erste Satz. Österreichische Romananfänge 1960-1980. Wien: WUV Univ.-Verl.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO