Co-word Analysis of Research Topics in Digital Humanities

  1. 1. Mitsuyuki Inaba

    Ritsumeikan University

  2. 2. Xiaoguang Wang

    Ritsumeikan University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction Over the last two decades, digital humanities has become
increasingly popular. Trying to combine computing
with traditional disciplines of arts and humanities,
digital humanities researchers have come from various
fields such as history, philosophy, linguistics, literature,
arts, and archaeology. While the interdisciplinary aspect
of digital humanities is obvious, its disciplinary structure
has not been established yet. In order to identify hot topics
and map the disciplinary structure, we made co-word
analysis based on a group of research papers’ titles.
Related Works
Co-word analysis is a content analysis technique that
uses patterns of co-occurrence of pairs of terms, i.e.,
words or phrases, in a corpus of texts to identify the relationship
between these terms, the extent to which these
themes are central to the whole area, and the degree to
which these themes are internally structured (Qin, 1999).
It does not rely on any a priori definition of research
themes in science. This enables us to follow actors objectively
and detect the structure of science without reducing
them to the extremes of either internalism or externalism
(Callon et al., 1986).
Method and Data
The first step of co-word analysis is to extract keywords
from records in corpus. We chose 516 papers written in
English from two journals and four conference proceedings:
Literary and Linguistic Computing from Year 2005
to 2008, Digital Humanities Quarterly from Year 2007
to 2008, and proceedings of ACH/ALLC 2005, DH 2006,
DH 2007, and DH 2008. Since these papers have no author
keywords, we manually extracted keywords from
their titles and picked out 1231 distinct keywords, which
appeared 2040 times in total. We conducted co-word analysis with two aims. One is
to detect the structure of a research field, and the other
to detect minor but potentially growing areas. To accomplish
these two aims, we selected top 70 keywords
(5.68%) whose frequencies were higher than 4 (see Table
These keywords’ total frequency was 681 (33.38%).
We calculated the association values between any word
pairs with Equivalence Coefficient index (E) which can
be defined as follows:
Eij = C2
ij Ci x Cj
Cij is the number of documents in which the keyword
pair (i and j) appears. Ci and Cj are the occurrence frequencies
of Keywords i and j in the group of the articles
respectively. Eij has a value between 0 and 1. Eij measures
the probability of Keyword i appearing simultaneously
in a document set, indexed by Keyword j, and vice
versa, given the respective collection frequencies of the
two keywords. Therefore, Eij is called “a coefficient of
mutual inclusion” by Turner et al. (1988).
Based on the Equivalence Coefficient indexes, we constructed
a co-occurrence matrix, and then made a multidimensional
Fig. 1 Multi-dimensional analysis of keywords
Combining the keywords’ frequencies with the multidimensional
analysis (see Fig. 1), we can see that the
keywords in digital humanities can be divided into three
categories. The keywords that fall under the first category—
the ones without underlines or frames—are directly
related to information technologies, such as digital, TEI,
XML, web, online, visual, and metadata. The hot topics
in this category include visualization, markup, text mining,
annotation, and digital library. The keywords that
fall under the second category, the ones framed, can be
associated with traditional humanities including literary,
linguistics, history, culture, language, poetry, novel,
and speech. With support of the information technologies
mentioned above, many researchers have focused
on medieval and early modern literatures, translation between
literary works in different languages, authorship,
gender, and stylistics. The third category with the keywords
being underlined consists of some general words,
such as knowledge, model, and framework.
Figure 1 shows these three categories all mix together,
with no obvious clusters in them. This means digital
humanities is still a new discipline without any subdisciplines
formed. What should be particularly noticed is
that English and French have been studied more than
other languages. This indicates that digital humanities
research has made uneven progress, depending upon
A co-word network was drawn with the co-occurrence
matrix (see Fig. 2). We calculated nodes’ degree centralities
in the co-word network with social network analysis
method. Degree centrality means the number of co-occurrence
with other keywords. As Figure 2 shows, all the
keywords can be divided into three levels, according to
their degree centralities.
An interesting phenomenon in Table 1 and Figure 2 is
that the frequencies of “digital humanities” and “humanities
computing” are high, but their degree centralities are low, while most of the others are coessential. A major
reason seems that digital humanities research has been
furthered in recent years. Six fundamental concepts have
been found which may benefit from the dissemination of
technologies related to textual digitalization. Still, digital
humanities is a new discipline, and researchers have
been trying to integrate digital technologies with traditional
arts and humanities. Some recent papers’ titles include
these two keywords and some other low frequency
keywords that may indicate new research directions,
e.g., geographical information system and interactive
games. We could interpret, therefore, these keywords
might work as an indicator for the future research topics
in digital humanities. While not giving the complete picture
of the digital humanities’ discipline structure, Figure
2 is complementary to the “intellectual map” painted by
McCarty W. and Short H. (2002). Conclusions
In this paper, we analyzed the structure of digital humanities
with co-word method. We counted frequencies of
keywords which we picked out from the titles of the selected
journal papers, recently published. Then we made
a multi-dimensional analysis and a network analysis. As
a result, six fundamental concepts are found in digital
humanities, but there are no clear subdisciplines in it yet.
While a widely used method in library and information
science, co-word analysis is still new for digital humanities
researchers. In the future, we will improve this analysis
further more in terms of methodology, developing
an integrated software for its common applications in
digital humanities.
Callon, M., Law, J., and Rip, A. (1986). How to study
the force of science. In Callon, M., Law, J., and Rip, A.
(eds.), Mapping the dynamics of science and technology:
Sociology of science in the real world. London: The
Macmillan Press Ltd, pp. 3-15.
McCarty, W. & Short, H. Mapping the field. (2002).
Qin, H. (1999). Knowledge discovery through co-word
analysis. Library Trends, 48(1):133-159.
Turner, W. A., Chartron, G., Laville, E., and Michelet,
B. (1988). Packaging information for peer review: New
co-word analysis techniques. In Van Raan, A. F. J. (ed.),
Handbook of quantitative studies of science and technology.
Netherlands: Elsevier Science Publishers, pp. 291-

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None