Synoptic Gospels Networked by Recurrent Markov Clustering

poster / demo / art installation
Authorship
  1. 1. Maki Miyake

    Department of Human System Science - Tokyo Institute of Technology

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

In this research, we represent the lexical co-occurrence information on the Synoptic Gospels under the form of a graph where the vertices correspond to the words or the concepts and the edges to the semantic paths. The aim of this study is to take advantage of the semantic network that is made for the Synoptic Gospels. This network is automatically constructed by a graph clustering method, a powerful technique to classify and reconnect the words
in documents. To generate the network, we are now
challenging to apply to the Gospels our original
clustering algorithm (Jung, 2005), whose key idea
derives from Markov Cluster Algorithm (MCL) that has been developed by van Dongen (2000).
Since the semantic network of the Synoptic Gospels is now articulated with lexical clusters that can be taken as “concepts” respectively, it might permit us to find the overview of the biblical world as well as to discuss the (genealogical) relationship among the Gospels, whose features will be projected onto some parts of the graph and traced at the same time by the flow of conceptual association.
Clustering Algorithm
For automatically drawing a concise semantic
panorama of the world into a graph, we have
recently proposed a new graph clustering algorithm, called Recurrent Markov Clustering (RMCL) method, which is one of the derivative methodologies of MCL.
The original MCL is based on random walks on a graph,
and its model simulates flow by using two simple
algebraic operations expansion and inflation on the
stochastic transition matrix. This algorithm is applied in several research fields such as biology, linguistics and psychology. It is worth enumerating, for instance,
Tribe-MCL for clustering proteins by Enright et al. (2002), Synonymy Network of Gfeller (2005) created with the addition of noise data, and Lexical Acquisition by Dorow et al. (2005) where some MCL clusters are merged for reconnecting concepts areas.
The new concept, RMCL uses the output of MCL as its input again. This improvement allows us to extract from the MCL results crucial information of clustered entities
and their hierarchical relationships. RMCL is intended
for a suitable control of the sizes of concept areas by the way of changing the granularity of the graph and the generality of the concepts.Articulating the recurrent
process, we go back from the resulting converged
state toward any on-going clustering step. This reversal
procedure is the core part of RMCL to generate a
virtual adjacency matrix of the hard cluster-nodes
obtained through MCL process. This downsized matrix
represents a simpler graph of the concepts built up
by the similar words. We introduce here one of the
recursive methodology, called Stepping-stone type
algorithm, which consists of combining the particular
clustering stages in progress of MCL with the final
converged clustering stage.
RMCL process steps as follows:
Step1: MCL process with an adjacency matrix, where each node represents a word.
Step2: Reversal procedure to build a virtual adjacency matrix, where each node represents a cluster of MCL.
Step3: Repeated MCL process with the virtual
adjacency matrix to compute a hard clustering for closing the cycle of a RMCL process.
Methodology
The following steps describe how to generate a network graph for the Synoptic Gospels by
applying RMCL to the lexical co-occurrence data
obtained from them:
1) Word Pairs Data obtained by a windowing method
Before using MCL process, it is necessary to make a list of word pairs that co-occur within a certain range of text. We practiced the windowing method to do this, which lead us to get a simplified representation of similarity
level suitable for clustering. The technique is that the window of a certain size slides over the sequences
in a text to thoroughly extract fixed-sized grams (for example, tri-gram) of words (Vechthomova, 2003). The pairs were made afterwards by the combination of all the extracted words.
In the case of the Synoptic Gospels, the windowing
method is applied to the Greek texts of the NT26th
version by Nestle and Aland (1979), where a verse is set
as document boundary. Various data are to be collected by
changing window size from 2 (words) through 10. But we extract solely the word pairs which are simultaneously common to Mark, Luke and Matthew so as to focus on the common aspect of the Bible and show only the data of the window size 2 to manipulate 4930 word pairs which are composed of 769 word occurrences.
2) Dictionary-based Stemming
Since we are interested in the lexical-semantic
information rather than the lexical-grammatical one, we took the stem form from words. The Stemming process was performed manually with BibleWorks Greek New Testament (BNM) Morphology. Additionally, 73 noise words were eliminated such as articles, prepositions, pronouns and conjunctions.
Finally we obtained a list of 1053 pairs with 468 word occurrences, and applied RMCL process to a 468*468 adjacency matrix.
3) RMCL Steps
At the step1, starting from the adjacency matrix of
co-occurrence, the MCL process generated a
nearly-idempotent matrix at the 13th cluster stage with 119 hard clusters. The reversal procedure computed a virtual adjacency matrix for each stepping cluster at the step2, and here we got 12 kinds of matrices. At the step3, we applied once again the MCL process to each virtual adjacency matrix obtained at the previous step. The reuse of the adjacency matrix generated with the 2nd cluster stage made the repeated MCL process flow until the 8th loop, where the number of the RMCL hard cluster turned out to be 65.
Results and Discussion
At the step 3, each intermediate cluster stage
generates different adjacency matrix. As for
selecting a particular one to be the most appropriate for interpretation, the variance of the RMCL cluster sizes can be considered as a criterion. The high variance means that the clusters are properly diversified to represent the multiple features of the text.
According to this criterion, we select the RMCL hard clusters of the 2nd cluster stage. Taking into account the result of MCL process, each hard cluster could be taken this time as a semantic or concept category.
If the symbol of “{ }” can be used for representing a component of words related with one another, there are some interesting components extracted, such as {“be pleased”, “beloved”} or {“sleep”, “die”}. The antonym
category is also found as in {“forgiveness”, “sin”}
and {“dead(noun)”, “live”}, the latter component
representing the concept of “resurrection”. The
component of {split, curtain temple} makes us think of the presence of the idiom category, and in fact this one is referred to in Mk15:38. We have also found the topic
category, in the components. For example, {new, fresh, old, wine, wine-skin} precisely means “new testament”.
For the RMCL clusters, the largest component cluster
included almost 30% of concept nodes. As the contents of this cluster can be described in terms of Jesus’ redeem and passion, we can conclude from this that it represents as a whole the synoptic gospel genre itself.
Conclusion and Ongoing tasks
In conclusion, the application of RMCL to the
Synoptic Gospels permitted us to create a compact semantic network in biblical lexicography, where the subject categories typical of the bible can be identified as concept clusters linked one another. Furthermore, it might be possible to use RMCL as a sort of ontology generator for the biblical studies, because we recognized that some of the main RMCL clusters contain a set of key words capable of producing taxonomic schemes or meta-data.
Obviously, the results of MCL clustering are more or less influenced by the initial selection of the co-occurring pairs, that is, the fixation of the breadth of the windowing
frame. We are now working on the estimation of
the appropriate values of parameters together with
the morphological manipulations respecting various viewpoints such as Redaction criticism and Collocation (Syntax). Our final goal is to accomplish the ontology of the
Synoptic Gospels, which would be able to represent the fundamental categories of the Bible with the information on their mutual relationship.
Further information and more detail results will be found at the URL: http://home.a04.itscom.net/hilolani/sgn.htm.
References
Enright, A. J., Van Dongen, S.,and Ouzounis, C. A.
(2002). An efficient algorithm for large-scale
detection of protein families, Nucleic Acids, 30(7),1575-84.
BNM: BibleWorks LXX/OG Morphology and Lemma
Database (2004). BibleWorks Greek New
Testament.
Dorow, B. et al. (2005). Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sence Discrimination, MEANING-2005.
Gfeller, D., Chappelier, J.-C., De Los Rios, P. (2005). Synonym Dictionary Improvement through Markov Clustering and Clustering Stability, International Symposium on Applied Stochastic Models and Data Analysis, 106-113.
Jung, J., Miyake, M., Hatanaka, N., Akama, H. (2005). For the Development of Composition Support
System based on Semantic Network by Repeated Clustering, IPSJ SIG-CE, No123, p99-105.
Nestle-Aland (1987). Novum Testmentum Graece 26th edition, German Bible Society Stuttgart.
Van Dongen, S. (2000). Graph Clustering by Flow
Simulation, PhD thesis, University of Utrecht.
Vechthomova, O., Roberston, S., Jones, S. (2003). Query expansion with long-span collocates,
Information Retrieval, vol6, p251-273.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None