Capturing the Social Networks of the Gospels through a Graph Clustering

poster / demo / art installation
Authorship
  1. 1. Maki Miyake

    University of Osaka

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The creation of social network representation is a
promising application of large-scale linguistic resources
as a means of capturing the patterns of social
relationships that exist among the individuals. In recent
years, several notable studies have produced a number
of social networks of the Bible and a variety of its graph
representations for gaining insights into the interactions
between the characters (Chris Harrison 2007, ESV Blog,
2007). Graph representation is an effective way of detecting
and investigating the intricate patterns of connectively
within large-scale corpora. A number of studies
have applied graph theory and network analysis methods
to mapping out the complex networks of word associations
within linguistic resources (Dorow, Widdows,
Ling, Eckmann, Danilo, and Moses, 2005; Steyvers and
Tanenbaum 2005; Gfeller, Chappelier, and De Los Rios,
2005). In terms of the biblical texts, I have successfully
applied a graph clustering technique to data processing
that utilizes a clustering-coefficient threshold in creating
sub-networks for the Gospels of the New Testament
(Miyake, 2008). This paper reports on the application of
a soft graph clustering method to four social networks
of the Gospels constructed based on the co-occurrences
of the people and places. Specifically, I propose a soft
graph clustering technique as a method of detecting the
community structures within the social networks of the
Gospels. The principle objectives of this study are to
investigate the interaction between the characters in the
stories.
The corpus used in the present study is the Greek version
of the Gospels (Nestle-Aland, 1979) and the data
mainly consists of names and places. A set of nouns such
as son, father, prophet are including as well, that are regarded
as important in representing the characteristics
and the roles of people in the stories. In creating four
social networks from the books of Mark, Matthew, Luke
and John, co-occurrence data is computed for pairs of
words that appear in the same verse (sentence) and the
words remain morphological forms that make is possible
to analyze the relationships between words such as “who
says whom”.
Table 1 presents the number of co-occurrence words
represented as nodes for each Gospel network and its
basic statistical measure such as degree average values,
the average shortest path and the average of clustering
coefficient that are clues for examining the structural
characteristics. Degree refers to the number of words
that are connected to a given word and its average value
shows its connectedness of nodes within a network.
These degree average values for each Gospel indicate
that these social networks have patterns of sparse connectivity.
The clustering coefficient is known as the index
for investigating probabilities that an acquaintance
of an acquaintance is also an acquaintance of yours, in
other words, as an index of the inter-connective strength
between neighboring nodes in a graph. Following Watts
and Strogatz (1998), the clustering coefficient of the
node (n) can be defined as: (number of links among n’s
neighbors)/(N(n)*(N)-1)/2), where N(n) denotes the set
of n’s neighbors. The coefficient assumes values between
0 and 1. In the social network study, the clustering
coefficient can be utilized as a measure of role ambiguity
or community hubs that have numerous links. The four
average clustering coefficients for the total nodes are all
quite high values of 0.5 or more, indicating strong connectedness
between nodes, which is a characteristic of
small-world networks. In order to discern the relationships among the words
within the social network, this study applies a soft clustering
method combining the hard clustering of Markov
Clustering (MCL) and the index of clustering coefficients.
MCL is the bottom-up classification methods that
allow us to detect the patterns and clusters within large
and sparsely connected data structures. Within the MCL
process, a graph is partitioned into hard clusters. However,
one particular problem with applying MCL to linguistic
resources is that the hard clustering approach is not
appropriate for words that have multiple meanings, more
specifically for the individuals who are involved in multiple
communities in the context of social networks. In
order to overcome this problem, that hard clustering of MCL is applied to the network removed the bottleneck
nodes which are identified as the hubs by taking the clustering
coefficient as a threshold. In this study, the only
nodes with the clustering coefficient of more than 0.2
were selected. After the MCL process, the resultant crisp
clusters are expanded with neighboring nodes to produce
overlapping clusters that include in the hub nodes.
Figure 1 plots the numbers of the MCL cluster sizes for
each network, which illustrate the transitions occurring
in downsizing the networks generated from graph clustering.
Taking the result of Mt for an example, the MCL
resulted in 44 hard clusters, with an average of ?? cluster
components (SD=??). Comparing between clusters
across the four social networks illustrates the different
roles associated with characters in the Gospels, such as
Jesus as the son of God in Mark and Jesus as Savior in
Luke.
1
10
100
1000
Node MCL
Mt
Mk
Lk
Joh
Figure 1
In order to compare between clusters across the four
social networks, the Jaccard coefficient is appropriate
index for measuring similarity between clusters, which
can be computed based on the number of elements in
the intersection set divided by the number of elements in
the union set. The table 2 presents the results of average
Jaccard coefficient value among four Gospels, indicating
that the similarities among first three Gospels such as
Matthew, Mark, and Luke are higher than those for John.
The first three Gospels have been referred to as the Synoptic
Gospels, because a high similarity has been recognized
among them. Table 3 presents a set of clusters with
the highest similarity for the Synoptic Gospels, which
can easily refer to a phrase beginning with “Render onto
Ceasar”. Figure 2 is a sample of the sub-networks focusing
on the word of Peter and its neighboring nodes. As
the neighboring structures for the node of Peter are different
among four networks, such a graph representation
makes it possible to examine structural similarities and
differences among social networks. In summary, this paper has reported on the application
of a soft clustering method combining the clustering coefficient
and Markov Clustering. Especially, the graph
clustering technique offered an effective way of controlling
over the hub nodes that are linked to numerous other
nodes. Examining social networks is useful in exploring
the interactions between characters and the features that
underlie word groups within the Gospels. In pursuing the
precise communities the characters are involved in and
the series of action and events in stories, the research is
working to make the dataset more sophisticated, such as
the treatment ambiguous names and personal pronouns.
References
S. van Dongen. (2000). Graph Clustering by Flow Simulation,
PhD thesis, University of Utrecht. B. Dorow, D. Widdows, K. Ling, J. Eckmann, D. Sergi, and E. Moses. (2005). Using Curvature and Markov
Clustering in Graphs for Lexical Acquisition and Word
Sense Discrimination, Proceeding of 2nd Workshop organized
by MEANING Project (MEANING-2005).
M. Steyvers, and J. B. Tenenbaum. (2005). The LargeScale
Structure of Semantic Networks: Statistical Analyses
and a Model of Semantic Growth, Cognitive Science,
29 (1): pp.41-78.
Gfeller, D., Chappelier, J.-C., and De Los Rios, P..
(2005). Synonym Dictionary Improvement through
Markov Clustering and Clustering Stability, International
Symposium on Applied Stochastic Models and Data
Analysis, pp. 106-113.
Maki Miyake (2008). Investigating word co-occurrence
selection with extracted sub-networks of the Gospels
Employing Clustering Coefficients, Digital Humanities
2008, pp.258-260.
ESV Blog. (2007). Mapping New Testament Social
Networks, <http://www.esv.org/blog/2007/01/mapping.
nt.social.networks>
Chris Harrison. (2007). Visualizing the Bible, <http://
www.chrisharrison.net/projects/bibleviz/index.html>

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None