Visualizing Collocations in Religious Online Forums

poster / demo / art installation
  1. 1. Thomas Schmidt

    Universität Regensburg (University of Regensburg)

  2. 2. Florian Kaindl

    Universität Regensburg (University of Regensburg)

  3. 3. Christian Wolff

    Universität Regensburg (University of Regensburg)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

One of the most influential concepts in Digital Humanities (DH) in recent years is Moretti’s (2000) idea of Distant Reading, more precisely the application of computational methods to analyze and visualize large amounts of text to gather new insights. Distant Reading has led to various successful projects especially in literary studies and linguistics (cf. Jänicke et al., 2015) but also religious studies, e.g. to analyze famous religious texts (McDonald, 2014; Slingerland et al.; 2017; Verma, 2017). We want to build primarily upon the work of (Pfahler et al., 2018) who applied topic modeling on Muslim online forums to investigate what this community is predominantly talking about. They identified several main topic clusters about eating, family and politics which are talked about the most.We want to further explore the application and potential benefit of Distant Reading-methods for the use case of religious online forums. Our research goal is to examine the content, language, topics and sentiments in religious online forums of different religious subgroups to identify differences and similarities and learn more about the way of life and beliefs of these communities.While we explore multiple methods like named entity recognition, topic modeling and sentiment analysis, in the following contribution we report upon our results for the method of collocation analysis. Via collocations, we want to analyze differences in the way several religious key concepts are discussed in online forums of different religious subgroups.2. MethodsWe have chosen Reddit ( for data collection since it is rather easy to scrape and one of the largest platforms on the internet. Furthermore, various religious subgroups are represented enabling us to compare content more easily.We have acquired all submissions (threads) for the time span of July 1, 2018 to July 1, 2019 for the three subreddits /r/Christianity ( , /r/Islam ( and /r/Occult ( . We chose the first two since they represent the two largest monotheistic religions and included the third one to also examine a rather esoteric religious direction.We have acquired over 700,000 comments and around 50 million tokens (figure 1).Figure 1: Corpus statisticsWe have chosen five as maximum length for a collocation and measure the strength of collocations via Pointwise Mutual Information (PIM) which scores the collocations based on their actual co-occurrence in the corpus in proportion to their expected co-occurrence if they were independent (Church & Hanks, 1989). To visualize collocations, we place the key concept in the middle and the collocations around them. The higher the PMI-value, the closer the concept. We also put the exact PMI-score on the edges.3. ResultsIn the following we showcase the use case for the spiritual key terms “love”, “religion” and “life” and highlight some insights we gained.Figure 2: Collocations for “love” in /r/IslamFigure 3: Collocations for “love” in /r/ChristianityFigure 4: Collocations for “love” in /r/OccultIn the Christian subreddit, we find that love shows most connections with idioms/quotes from the bible (“unconditionally”, “enemies”, “agape”; figure 3). In contrast, we find strong associations with positive terms, words for god and the prophet as well as for “family” in the Muslim forum (figure 2) which is in line with Pfahler et al. (2018) showing a strong focus on family-related topics in Muslim forums. For /r/occult we find rather fitting associations with the notion of magic, thus showing the rather esoteric content of this forum (figure 4).Figure 5: Collocations for “religion” in /r/IslamFigure 6: Collocations for “religion” in /r/ChristianityFigure 7: Collocations for “religion” in /r/OccultMany terms in /r/Islam and the concept of religion point to discussions about religious directions e.g. “organized”, “abrahamic”, “culture”, “major” (figure 5). The connection with race might be connected to the racism Muslims face in western countries. Quite similarly, /r/Christianity also shows collocations describing the discussion about other religions (“organized”, “islam”, “false”) also pointing to rather heated discussions (“utter”, “nonsense”; figure 6). /r/Occult shows collocations specifying the religion and other world views (“Egypt”, “ancient”, “philosophy”, “science”; figure 7).Figure 8: Collocations for “life” in /r/IslamFigure 9: Collocations for “life” in /r/ChristianityFigure 10: Collocations for “life” in /r/OccultIn /r/Christianity, “life” is associated with words pointing to the afterlife (“everlasting”, “eternal”, “immortal”) while in /r/Islam, it is rather tied to terms describing a direction in life (“purpose”, “meaning”; figure 8 and 9). However, both subreddits show connections with rather positive words except for death concepts. Those collocations are indeed stronger for /r/Islam (“rest”, “death”, “short”). The collocations are quite varied for /r/occult (figure 10).Overall, we were able to gather some first insights like the strong difference of /r/occult, connections to family and politics for some key concepts in the Muslim forum or the focus on discussions about religious directions for the concept of religion in all forums.We plan to investigate other methods of computational text analysis but also want to apply more in-depth qualitative analysis of parts of our corpus via content analysis to confirm and evaluate some of our assumptions we derived via the collocation visualizations.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO