Creating a Digital Tombstone Archive: From Fieldwork to Theory Formation

paper, specified "long paper"
  1. 1. Oliver Streiter

    Staatliche Universität Kaohisung / National University of Kaohsiung

  2. 2. Yoann Goudin

    Institut National des Langues et Civilisations Orientales

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction: A digital tombstone archive

1.1. Overview: Scope and motivation

Since 2007 we work on the construction and exploitation of a digital tombstone archive, “ThakBong”, an archive which mainly contains tombstones of Taiwan, but includes also tombstones of China and tombstones of Chinese migrants in Asia, Europe and the USA. So far, on 850 visits to 500 graveyards, about 170.000 photos of 42.000 tombs have been taken. Repeated visits to graveyards allows us to document regionally different temporal patterns of ancestor veneration and the life-cycle of a tombs, including the burial, a second burial, a tomb renovation and the removal of the tomb. In fact, graveyards in Taiwan continue to disappear from the geographic and cultural landscape through development projects, natural catastrophes and the transformation of graveyards to bone-ash-towers. About half of all graveyards have already been lost or will be removed according to governmental projects in the years to come. As graveyards continue to be taboo for Taiwanese researchers, the records of graveyards we produce are among the only available.

Fig. 1: A graveyard in Southern Taiwan
Using geo-referenced photos as primary data, tombstones, tombs and graveyards are digitally reconstructed, linking back and forth between images and data. Images and data are constantly updated and made available to the scientific community via the research data archive DANS . They can be used in a wide range of research approaches, including corpus linguistics, social history, anthropology and human geography. The wealth of the data, and the interleaving levels of language, culture, geography and history however require the elaboration of more integrative and cross-disciplinary approaches. Beyond the topics in relation to Taiwan, the data permit empirical tests for theories of colonization, globalization, cultural contagion , and social struggle through cultural practices . The approach that we take to our data is that of a Digital Anthropology that combines sociological theories with the transformation of cultural practices .
1.2. Scope of this Presentationy

In this presentation we will summarize the main aspects of our digitization work, starting with the sampling and finishing with some theoretical insights we gained through our work. The purpose of this presentation is two-fold. First, we want to provide those who intend to launch a similar documentation and research project of their local graveyards with a short project description, which they might follow and, if needed, modify. Second, we want researcher who consider to use the "ThakBong" archive for their future studies to be able to evaluate the scope, coarseness and reliability of the data.
2. Methodology

2.1. Sampling

The purpose of the sampling is to produce an adequate representation of the reality in all dimensions we can think of, to avoid many of the unmotivated generalization we find in more traditional research. We are especially eager to capture the voices of poor and uneducated people, as they are usually overheard in linguistic and historic accounts. We therefor sample, without distinction, old and new tombs, king-size national monuments and most elementary tombs. In addition, we try to cover all ethnic groups, all religious orientations and all administrative divisions. For practical reasons, however, we cannot achieve a truly balanced sampling, because urban regions and catastrophe-prone areas have already lost most of their historic graveyards.
2.2. Fieldwork

The central tools for the fieldwork are digital GPS-cameras, costing about 250€, which store in the EXIF-header the position, altitude and orientation, along with more common metadata. For geographic analyses and the mapping of tombs, graveyards and cultural practices these data are indispensable.
To optimize the automatic use of these data, photos are taken in a regulated way, using two circles with two defined centers as a reference for the photos: All photos from outside are all taken into the direction of the tombstone, allowing the orientation of the tomb and the orientation of the shots to be calculated from the direction of one central camera shot. Photos in the area of the mourners, in front of the tombstone, are taken from the center of this space, allowing to calculate the location of the components of the tomb around the mourner from the orientation of the tomb. Also, through this model, photos are linked in a systematic way, making it possible to browse and virtually explore the tombs.

Fig. 2: The model regulating how photos are taken, so that automatic processing and browsing of the tomb become possible.
2.3. Processing of Primary Data

For the processing and multi-user annotation we use the postgreSQL database the postGIS GIS extension, storing transcriptions in XML. The database tables represent digital objects as bundles of features. A feature is a unique combination of an attribute, e.g. 'direction', a value, e.g. '180' and if necessary a unit, e.g. 'degree'. These objects are called "graveyard", "tomb", "tombstone", "transcription" and "person", representing the corresponding real-world objects. The object "person", for example, contains the features 'surname', 'given name', 'religious name', 'date of birth', 'date of death', 'date of burial', 'ethnicity', 'gender' and 'role', e.g. 'mourner' or 'deceased'. An average tomb has about 50 defined features, but numbers might be much higher for family tombs.
In a first processing step, these objects are created manually through a web-interface that segments the stream of photos. Images that show an object are linked to that object. Images showing inscriptions, offerings, symbols and figurative representations are specifically tagged, so that images can be further filtered, to facilitate the annotation process.
More than 10.000 lines of program code in plpqsql implement PostgreSQL triggers, which help to reduce the manual annotation labor. They can be divided into five groups:
Rules that extract and processes data from the image. From these the position, altitude and slope are calculated.
Rules that fill in logically implied values. E.g. if no image of a tombs is tagged for 'offering', the tomb is marked as offerings='no'.
Rules for a model-based annotation. Models are used for non-visible features, such as the 'ethnicity' of the deceased, if not known otherwise. These statistical models maximize the number of correct annotations over the whole data set, allowing at the same time manual corrections of individual tombs to be respected and calculated into the model. Using external statistical data, for example, on the relation of surnames, administrative regions and ethnicity , the most likely ethnicity for the deceased can be calculated, given the region and the surname on the tombstone.
Where no external data resources are available, e.g. for the prediction of the 'gender' from the 'given name', we use bootstrapping: Using statistical data that have been produced in the manual annotation of unambiguous cases, ambiguous cases are automatically annotated when the memory-based model makes clear predictions for a given name. For all features, their epistemic status are retained: Model-derived data are updated when manually set data change.
2.4. Annotation and Transcription

After experiments with OCR on tombstone images brought no results, tombstones are transcribed manually, phrase by phrase. Until now 24.000 tombstones have been completely, about 10.000 partially transcribed. Transcribed phrases are classified semi-automatically into 'semantic roles', such as 'place', 'person', 'date of birth' etc. Then, example-based taggers extract relevant data, such as dates, names and family relations and fill in the relevant features. Where automatic processes do not yield a clear classification, the system shifts to an interactive mode. All other features, e.g. the color and form of the tombstone are annotated manually until we will have completed the extraction of these features from other photos, using an example-based approach and a similarity metrics of photos. For the entire project, 6 man-years have been invested in 7 years, showing that with the right balance of automatic processing and manual annotation huge data can be created.

2.5. Analysis and Theory Formation

The most striking fact about the data is the enormous variation one finds through time and space. This variation contrasts sharply with the literature on funerary traditions in Taiwan, and, second, with statements that informants produce when explaining their traditions. The data thus not only question the foundations of established research on funerary rites, but also on research that uses informants as a source of information. In fact, the relations between publications, informants’ opinions and a national ideology become palpable to such an extent that a scientific approach has to look into the involvement of ideologies in the transformation of cultural practices. The DH-approach is quite suitable for this endeavor: Digitizing scientific and political publications that stand in relation to funerary rites, we could contrast publications with the reality they pretend to describe and reveal their influences on social practices. More particularly, we could show how a governmental publication influenced the way that Taiwanese refer to their ancestral home through tombstone inscriptions .

Fig. 3: An example for the variation through time and space. The onset of the place-name type 'tanghao' (blue) between 1900 and 2000. Adjacent regions may show similar or very different patterns in the development. B refers to the correlation to the Baijaixing, a century-old book which specifies which surname matches which tanghao. High correlations identify the place-name as a literary reference.
Sperber, Dan (1996). La contagion des idées. Paris: Odile Jacob.
Bourdieu, Pierre (1979). La distinction: Critique sociale du jugement. Collection Le sens Commun Paris: Éditions de Minuit.
de Certeau, Michel and Giard, Luce and Mayol, Pierre (1980/1990). L'invention du quotidien: Arts de faire. Paris: Gallimard.
Holl, Stephan and Plum, Hans (2009). PostGIS. GeoInformatics 3: 34–36.
Chen, Shao-hsing and Fried, Morton (1968). The Distribution of Family Names in Taiwan: Volume I, The Data. Taipei: National Taiwan University & Columbia University.
Streiter, Oliver and Goudin, Yoann and Huang, Chun (Jimmy) and Lin, Ann Mei-fang (2012). Matching Digital Tombstone Documentation to Unearthed Census Data. International Journal of Humanities and Arts Computing 6, 1-2: 57-70.
Streiter, Oliver and Goudin Yoann (2014). The Tanghao on Taiwan's Tombstones: The Recuperation of Tactics for a National Space. Archivi Orientalni.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from (needs to replace plaintext)

Conference website:

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO