Data Criticism: General Framework for the Quantitative Interpretation of Non-Textual Sources

poster / demo / art installation
Authorship
  1. 1. Asanobu Kitamoto

    National Institute of Informatics

  2. 2. Yoko Nishimura

    Toyo Bunko

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction

The usage of non-textual sources for historical studies, in addition to conventional textual sources, has a potential to expand our historical knowledge, but non-textual sources have been used less frequently than textual sources, and framework for the critical usage of non-textual sources is under developed. We therefore are working on a framework called “data criticism” to develop methodological commons for the quantitative evaluation of quality and value of non-textual sources (= data sources), such as maps, photographs and illustrations. Here, in the same sense with textual sources, we should not forget critical evaluation in spite of its numerical or graphical appearance that may be misunderstood as "the objective representation of facts." The fundamental requirement of using sources for historical studies is that we should critically “read” sources to reduce uncertainty, biases, and other factors that affect the quality of facts derived from sources.
2. Data Criticism

We have been working on Digital Silk Road project (dsr.nii.ac.jp) since 2001, starting from the digitization of historical sources to the analysis of digitized materials for historical studies, especially in the area of Silk Road. We realized that not only textual sources, but also non-textual sources have the challenge of proper interpretation because many types of errors were found. Maps had errors due to technical limitation at the era of making maps, and photograph captions had errors due to misunderstanding or different conceptualization. While working on fixing these problems, we realized that this task is exactly the non-textual version of textual criticism (text critique) which is at the core of historians’ research. We therefore propose the concept of ‘data criticism’ both to clarify the role of the task in the whole process of historical research, and to raise attention to the importance of computational tools, namely algorithms and databases, for processing data. However, the development of computational tools is required for humanists to pursue the potential of data criticism because off-the-shelf tools are usually not available. Hence the contribution of the paper is to demonstrate with case studies how maps, photographs and illustrations can be critically interpreted and used as historical evidences, and also to provide digital tools to support humanities perform this task.
Data criticism has relationships with a few similar concepts. Data criticism on maps, or what we can call map criticism, has much in common with methodologies developed in cartography and geography (for example, reference), but data criticism focuses more on the integration of multiple spatial and visual sources, or also textual sources such as placenames, to cross-check the interpretation of historical landscape. Data criticism can also be characterized as a quantitative approach in comparison to a qualitative approach used previously by historians to “read” evidences from paintings or illustrations based on human interpretations. Here the word ‘interpretation’ needs to be clarified because historical studies deal with two levels of interpretation, namely the evidence level and the history level. In particular, the latter focuses on answering historical research questions, but this answer depends more on the cognitive process of how historians construct the history. The word ‘capta’ is suggested to clarify the constructivism context of historyin comparison to the word ‘data’ that sounds more objective and observer independent. In this sense, our interest is in the evidence level (data), not in the history level (capta). The goal of data criticism is to provide more reliable evidences from data sources through quantitative and integrated procedures using computer algorithms.
3. Case Studies

We introduce data criticism with three case studies from the achievement of our project, namely Digital Silk Road Project.
The first case study is about a map of Silk Road made by Aurel Stein in the beginning of 20th century. This map is still considered as the authoritative reference of Silk Road studies, but the map has a mysterious problem; some of the ruins recorded on the map cannot be found at respective geographic coordinates. A typical interpretation of this mystery explains that those ruins were destroyed or disappeared since his visit of about 100 years ago. We discovered, however, from archaeological survey that those “missing ruins” are still there, but at the location suggested by the map plus the error of the map estimated from ground control points (GCP) given on the map for geometric correction. This ‘place-matching’ result is also supported by evidences from non-textual sources such as photographs and illustrations, because they provide information about ruin’s 2D or 3D structures which has less probability of coincidence. After this matching, we also realized that the comparison of ruins’ name, or ‘name matching’ does not work, because a string comparison of names cannot overcome the problem of linguistic difference between endonyms and exonyms, or different naming conventions due to arbitrary conceptualizations. This case study suggests that the integration of multiple non-textual sources can provide more reliable evidences and new interpretations of historical facts.
The second case study is about a map of Gaochang made by Albert Grünwedel in the beginning of 20th century. This is an important map because artifacts excavated from the site were recorded by ruin symbols on the map, but the accuracy of the map has been considered as untrusted because the map seems to be a sketch with significant distortion. We hypothesized that this is a topological map, just like a subway map, designed for navigation purposes by preserving topological relationships such as connection and intersection of geographical features. We criticized the topological structure of the map, and finally identified most of the ruins recorded on the map. Photographs are again used to support evidences about the identity of ruins. For this purpose, we developed a web-based tool, ‘mappining’ for what we call ‘interactive georeferencing.’ This tool is designed to realize on-the-fly registration of two maps at the focal point specified interactively by a user. This method can avoid significant distortion of the topological map in comparison to geometric correction of the entire map, so historians can keep the original shape of the map, while taking advantage of approximate place matching in the neighborhood of the focal point.
The third case study had the highest technical challenge, because the target map, Complete Map of Peking, Qianlong Period, made about 250 years ago, is a huge map having 29 billion pixels and being separated into 203 sheets. Massive geometric correction of this map required 1800 ground control points and 500 control lines. The uniqueness of this geometric correction is that we used not only control points but also control lines to maximally preserve the linear features of streets in Beijing through geometric correction. This was the first time in the world to obtain the digital version of the fully-connected and geometrically-corrected map.
After geometric correction, we realized that some parts of the map could not be matched well with the current map. This problem was known to some historians, but they could not explain the reason, so they simply judged that this map cannot be trusted. We found, however, that this problem can be explained by the erroneous reconstruction of the broken map at some point in history. We discovered the exchange of sheets from left to right or vice versa, which cannot be caused by mistakes in the digitization process, because some exchange can be observed within a single map sheet. This insight was obtained only after we had an entire picture of the map, and historians in the past could not notice this problem as long as they study the map sheet by sheet. This indicates that human interpretation in a microscopic scale has limitation in understanding the macroscopic issues, and there has been technical barriers to realize this viewpoint. Data criticism tools have potential to break this barrier and may lead to new discoveries based on new interpretations from new viewpoints.
These case studies suggest that the proper treatment of non-textual sources needs an integrated approach using not only maps but also photographs and other spatial and visual sources. Photographs should be interpreted in a three-dimensional historical landscape to identify the location and direction of the photograph. This interpretation should be supported by textual criticism of photograph captions that may also affected by errors or misconceptions.
4. Conclusion

In short, data criticism is not about making historical Geographic Information Systems (GIS) that maps historical facts into digital space and analyze them, but about making quantitative and integrated digital tools to analyze, enhance and re-discover the value of historical sources. We plan to generalize our framework to establish the field of “data criticism” so that we can accumulate historical evidences not only from textual sources but also from data sources. The key is to develop easy-to-use digital tools for humanists to be used by themselves. This is where digital humanities comes in; where team work between computer scientists and humanists can lead to a breakthrough. The final goal is to establish data criticism as a new research field that has its own methodological commons and framework to combine appropriate tools to answer historical research questions.
References

Ono, K., KITAMOTO, A., Onishi, M., Andaroodi, E., Nishimura, Y., Matini, M.R. (2008), Memory of the Silk Road -The Digital Silk Road Project-, Virtual Systems and Multimedia (VSMM), Vol. Project Papers, pp. 437-444
Gregory, I.N., Healey, R.G. (2007), Historical GIS: Structuring, mapping and analyzing geographies of the past, Progress in Human Geography, Vol. 31, No. 5, pp. 638-653.
Drucker, J. (2011), Humanities Approaches to Graphical Display, Digital Humanities Quarterly, Vol. 5, No. 1
Silk Road Maps, dsr.nii.ac.jp/geography/
Kitamoto, A., and Nishimura, Y. (2009), Geometric correction of measured historical maps with a pixel-oriented and geobrowser-friendly framework, Proceedings of the 22nd International Symposium on Digital Documentation, Interpretation & Presentation of Cultural Heritage (CIPA)
Mappinning - Interactive Georeferencing by Pinning Old Maps, dsr.nii.ac.jp/digital-maps/mappinning/
Digital Maps of Old Beijing, dsr.nii.ac.jp/beijing-maps/
Gregory, I.N., Healey, R.G. (2007), Historical GIS: Structuring, mapping and analyzing geographies of the past, Progress in Human Geography, Vol. 31, No. 5, pp. 638-653.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2014
"Digital Cultural Empowerment"

Hosted at École Polytechnique Fédérale de Lausanne (EPFL), Université de Lausanne

Lausanne, Switzerland

July 7, 2014 - July 12, 2014

377 works by 898 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (needs to replace plaintext)

Conference website: https://web.archive.org/web/20161227182033/https://dh2014.org/program/

Attendance: 750 delegates according to Nyhan 2016

Series: ADHO (9)

Organizers: ADHO