Poetry in Prose: automatic identification of verses in brazilian literature

poster / demo / art installation
  1. 1. Ricardo Carvalho

    State University of Feira de Santana

  2. 2. Angelo Loula

    State University of Feira de Santana

  3. 3. João Queiroz

    Universidade Federal de Juiz de Fora

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In 1946, the brazilian poet Guilherme de Almeida published a study on the structured patterns of verses that he discovered in the prose of ‘Os Sertões’ (‘Rebellion in the Backlands’) by Euclides da Cunha (1902). According to Almeida’s work, there is, in the Euclidean prose, apparently more often at the end of the paragraphs, versification structures of various rhythmic patterns. In 1996, another study on the same literary work was published by Augusto de Campos validated Almeida’s discovery and revealed several others versified patterns in Euclidean prose. Dodecasyllables and Alexandrines are among the most used metric patterns, in varied combinations and positions. The diversity of patterns found, disregarding “the strict metrification” and admitting “more rhythmic freedom” (Campos, 1996), creates surprising zones of tension, “areas spread with poetry in significant portions of poetry in his prose” (Campos, 1996).
The process of separating and nesting of poetry syllables, the scansion process, is usually applied to text structures categorically defined as poetry, allowing mapping of poetry metric characteristics present in the author’s writing. Performed by a person, the same analysis carried out by Almeida and Campos require, depending on the size of the piece, hours, days or even months of work.
Our work proposes the use of computational techniques to perform the process of automated scansion and analysis of Euclides da Cunha’s prose, revealing its verse structures, thus reducing time for the task, providing a new tool for prose analysis and opening a new research agenda. These verse structures, distributed along the text, are found using computational methods based on scansion rules for Portuguese language. As the location of these structures are not previously given, any sentence is treated as a potential candidate for a verse, moreover segments of the sentences can also be considered.
In order to identify metric verses in the text, our system performs four major steps: extraction of sentences, separation of syllables, scansion, and overlay of verses in the original text. From a digital copy of the book, sentences are extracted according to punctuation mark present in literary piece. In Portuguese, the rhythm or musicality of a verse follows the alternance of strong (tonic) syllables and weak (atonic) syllables, so along with syllable boundaries, the position of the tonic syllable is also identified for every word. Therefore, every word in each sentence undergoes syllable separation following grammatical rules, applying the software developed by Neto et al (2015), defining initial syllable boundaries. Besides the positions of tonic syllables are also identified, determining rhythmic features.
To identify verses, the final process of scansion is performed, considering intravocabular (syneresis and diaeresis) and intervocabular (elision and crasis) phonological changes. These changes may alter initial syllable count, for example with the omission of one or more sounds, merging two syllables in a single one. As a final output, the verses identified are overlaid on the original document, along with verse classification, metric count, syllable separations and tonic syllables position, replacing the original sentence, allowing analysis by the user in context.
Initial experiments with the proposed system were performed for the book ‘Os Sertões’ by Euclides da Cunha, aiming to reproduce in a computer lab the work performed by Guilherme de Almeida in the 40s and Augusto de Campos in the 90s. As an example of the results, in page 67, the system identified previously twenty-four candidates verses. Of these, we have, "O sertanejo é, antes de tudo, um forte", one segment of text that starts the third chapter, identified by our system as a dodecasyllable "O / ser/ta/ne+/jo é+,/ an+/tes/ de/ tu+/do/ um/ for+/te", where ‘+’


Almeida, G. (1946). A poesia d’Os Sertões.
Diário de São Paulo. August 18.

Neto, N., Rocha, W. and Sousa, G. (2015). An open-source rule-based syllabification tool for Brazilian Portuguese.
Journal of the Brazilian Computer Society, 21(1): 1-10.

Campos, A. (1996). TRANSERTÕES.
Folha de São Paulo. November 3.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website: https://dh2016.adho.org/

Series: ADHO (11)

Organizers: ADHO