Optical Character Recognition for Complex Scripts: A Case-study in Cuneiform

paper, specified "long paper"
Authorship
  1. 1. Shai Gordin

    Ariel University, Israel

  2. 2. Avital Romach

    Tel-Aviv University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Cuneiform is one of the earliest writing systems in the world, usually written by pressing a stylus on moist clay tablets, creating a three-dimensional script. It is a logo-syllabic script, like Chinese or Japanese writing systems (Veldhuis, 2012; Kwan, 2014; Francis, 2020). There are close to a thousand cuneiform signs, not all of which were used simultaneously—usually about 200-300 signs were used at once. Additionally, the forms of these signs changed drastically during the 3,000 years in which this writing system was in use. Cuneiform tablets are available for OCR in three main formats: (a) 3D or 2D+ models, the most accurate representation, but these are still relatively rare and expensive to produce; (b) 2D images of cuneiform objects, which are fast growing in quality and ubiquity; and (c) hand-copies, 2D drawings made by scholars of the inscribed signs. These are still commonly used in the field, and are the majority of available representations of cuneiform writing. Furthermore, cuneiform writing and all the languages using it, are low-resource languages, which raises difficulties when implementing some computational methods (Hedderich et al., 2021). Although the number of digital texts is steadily growing in recent years, the gap is still substantial, and significant efforts are needed for digitization.
The best results for character recognition of cuneiform signs or strokes thus far have been achieved from 3D models (Mara et al., 2010; Fisseler et al., 2013; Fisseler et al., 2014; Rothacker et al., 2015). Work on hand-copies, or 2D projections of 3D models which look like hand-copies, are also showing promising results (Mara and Krömker, 2013; Bogacz et al., 2015a; Bogacz et al., 2015b; Bogacz et al., 2016; Massa et al., 2016; Bogacz and Mara, 2018; Yamauchi et al., 2018), as well as some work on 2D images (Rusakov et al., 2019; Rusakov et al., 2020; Dencker et al., 2020; see also Bogacz and Mara, forthcoming). These results, nevertheless, are limited in scope because of lack of labeled data, lack of 3D models, or, most often, a lack of an accessible way for cuneiform specialists to use the models developed.
With this gap in mind between the specialists and the code, the set of CuRe tools (Cuneiform Recognition) of the
Babylonian Engine project was created as an online interactive platform for scholars.

Funded by the Ministry of Science & Technology, Israel, Grant 3-16464.
The idea was that already in the design and initial training of machine learning (ML) models, humans, particularly experts, should take an active part. The ML models are envisioned as “co-workers” which provide likely suggestions to the user, aiding the process of cuneiform scholarly edition publication, and improving as the user corrects them. This way, it is not only the ML models that benefit from the corrections and labeled data created by experts, but also the experts can enjoy a designated work environment for any type of cuneiform text, and download the results of their work–already advancing cuneiform scholarship.

Our main tool, CuRe (Cuneiform Recognition), takes images of hand-copies as input, either uploaded by the user or one of the images in the CuRe dataset of copyright-free hand-copies of Neo- and Late Babylonian documents (see demo, the full platform will be available in the near future). Hand-copies were chosen for initial training because they are more ubiquitous than 2D images or 3D models, and many hand-copies are still missing full publication of their textual editions. The signs are detected using a Faster R-CNN ResNet 50 model (detectron2), and then given suggestions for their value with ResNet-18 (Figure 1). The two stages of the model are also split for the user. First, the user corrects the segmentation, and then they are given the suggested readings. The models currently perform with ca. 92% accuracy for sign identifications, checked against a validated test set, and we are constantly enlarging the dataset and retraining the model.

We are in the process of developing another stage in this tool: stroke recognition from 2D images. Identifying the strokes—the constituent parts which make up a cuneiform sign—is drastically simpler than identifying the whole (similar to text recognition for Japanese in Liang et al., 2015). While the signs changed drastically in the 3,000 years in which cuneiform was in use, the strokes and writing technique remained similar. Furthermore, since there are only three main stroke types (horizontal, vertical, and oblique), it is much quicker to obtain a large corpus of examples of each, as compared to collecting enough samples for every sign and its variant forms. There have been no previous attempts of identifying strokes from 2D images. After identification using Faster R-CNN ResNet 50, vector images are created with schematic representations of the strokes, which are quite like hand-copies (Figure 2).
The final goal is for the user to create a digital text using OCR in three stages: receiving and correcting (a) stroke identification, (b) sign detection, and (c) the identification of the sign values. This process creates additional validated training data for all our models, constantly improving them, while avoiding multiplying the error rates in the final result—the digitized text. Since the models are already approximately 80%-90% accurate (and rising), the amount of corrections is not too cumbersome, and it is already able to save a significant amount of time that experts usually spend on deciphering cuneiform texts.
To conclude, designing ML models as part of a human-in-the-loop pipeline application has the following benefits: (a) people are incentivized to create the data needed for training; (b) breaking down the “reading” process of the OCR, allows for less mistakes in the final digitized text; (c) the ML models are used from their inception in a real-world scenario, providing real-world value to the research community. Furthermore, we believe the CuRe tool-set can also be a valuable learning environment for cuneiform students or any interested laypeople. The advantage for cuneiform studies is thus twofold: growing a database of available digital editions for research, and creating a learning environment which can help disseminate knowledge on some of the oldest civilizations in human history. The digital editions created will be downloadable in the standard formats of the field (ATF, TEI/XML, and JSON). We believe that this work pipeline—breaking down the OCR process and developing human-in-the-loop ML models—is an effective way for solving OCR for additional low-resource and complex writing systems (compare Hashimoto et al., 2018 on Japanese), or for that matter, NLP applications for diverse low-resource languages (Wang et al., 2021).

Figure 1: A screenshot of the CuReI-tool from the Babylonian Engine website, currently in deployment, see
demo

Figure 2: A section of cuneiform writing with ground truth and identified labels for horizontal strokes (top), and a vectorized representation of the strokes (bottom). The grayscale on the strokes signifies the certainty level of the model (tablet image:
YPM BC 16773 © Yale Babylonian Collection, courtesy of Klaus Wagensonner)

Bibliography

Bogacz, B., Gertz, M. and Mara, H. (2015a). Character Retrieval of Vectorized Cuneiform Script. In 13th International Conference on Document Analysis and Recognition (ICDAR 2015). Piscataway, NJ: IEEE Computer Society, pp. 326–30. 10.1109/ICDAR.2015.7333777.

Bogacz, B., Gertz, M. and Mara, H. (2015b). Cuneiform Character Similarity Using Graph Representations. In Wohlhart, P. and Lepetit, V. (eds), 20th Computer Vision Winter Workshop. Graz: Verlag Der Technischen Universität Graz, pp. 105–12.

Bogacz, B., Howe, N. and Mara, H. (2016). Segmentation Free Spotting of Cuneiform Using Part Structured Models. In 15th International Conference on Frontiers in Handwriting Recognition (ICFHR 2016). Piscataway, NJ: IEEE Computer Society, pp. 301–6. 10.1109/ICFHR.2016.0064.

Bogacz, B. and Mara, H. (2018). Feature Descriptors for Spotting 3D Characters on Triangular Meshes. In 16th International Conference on Frontiers in Handwriting Recognition (ICFHR 2018). Piscataway, NJ: IEEE Computer Society, pp. 363–8.

Bogacz, B. and Mara, H. (forthcoming). Digital Assyriology - Advances in Visual Cuneiform Analysis. Journal on Computing and Cultural Heritage.

Dencker, T. et al. (2020). Deep Learning of Cuneiform Sign Detection with Weak Supervision Using Transliteration Alignment. PLOS ONE, 15(12), p. e0243039. 10.1371/journal.pone.0243039.

Fisseler, D. et al. (2013). Towards an Interactive and Automated Script Feature Analysis of 3D Scanned Cuneiform Tablets. In Scientific Computing and Cultural Heritage 2013. http://www.cuneiform.de/fileadmin/user_upload/documents/scch2013_fisseler.pdf.

Fisseler, D. et al. (2014). Extending Philological Research with Methods of 3D Computer Graphics Applied to Analysis of Cultural Heritage. In Klein, R. and Santos, P. (eds), Eurographics Workshop on Graphics and Cultural Heritage (GCH 2014). Goslar: The Eurographics Association, pp. 165–72. 10.2312/gch.20141314.

Francis, N. (2020). New Research on the Adoption and Transformation of Chinese Writing. Chinese Language and Discourse, 11(1), pp. 134–45. 10.1075/cld.19007.fra.

Hashimoto, Y. et al. (2018). Minna de Honkoku: Learning-Driven Crowdsourced Transcription of Pre-Modern Japanese Earthquake Records. In ADHO / EHD 2018 - Mexico City.https://dh-abstracts.library.cmu.edu/works/6312 (accessed 9 December 2021).

Hedderich, M. A. et al. (2021). A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. NAACL-HLT 2021. Online: Association for Computational Linguistics, pp. 2545–68. 10.18653/v1/2021.naacl-main.201.

Kwan, T.-W. (2014). Phenomenological Interpretation of the “Six Ways” of Chinese Script Formation. In Gordin, Sh. (ed.), Visualizing Knowledge and Creating Meaning in Ancient Writing Systems. Berliner Beitrage zum Vorderen Orient. Gladbeck: PeWe-Verlag, pp. 157–202.

Liang, J. et al. (2015). Character-Position-Free On-Line Handwritten Japanese Text Recognition. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). Kuala Lumpur, Malaysia: IEEE, pp. 231–5. 10.1109/ACPR.2015.7486500.

Mara, H. et al. (2010). GigaMesh and Gilgamesh – 3D Multiscale Integral Invariant Cuneiform Character Extraction. In Artusi, A. et al. (eds), The 11th International Symposium on Virtual Reality, Archaeology and Cultural Heritage. VAST: International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage. The Eurographics Association. 10.2312/VAST/VAST10/131-138.

Mara, H. and Krömker, S. (2013). Vectorization of 3D-Characters by Integral Invariant Filtering of High-Resolution Triangular Meshes. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 2013). Piscataway, NJ: IEEE Computer Society, pp. 62–6. 10.1109/ICDAR.2013.21.

Massa, J. et al. (2016). Cuneiform Detection in Vectorized Raster Images. In Čehovin, L. Mandeljc, R. and Štruc, V. (eds), 21st Computer Vision Winter Workshop. Ljubljana: Slovenian Pattern Recognition Society.

Rothacker, L. et al. (2015). Retrieving Cuneiform Structures in a Segmentation-Free Word Spotting Framework. In Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015). New York, NY: Association for Computing Machinery, pp. 129–36. 10.1145/2809544.2809562.

Rusakov, E. et al. (2019). Generating Cuneiform Signs with Cycle-Consistent Adversarial Networks. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing. HIP ’19. New York, NY, USA: Association for Computing Machinery, pp. 19–24. 10.1145/3352631.3352632.

Rusakov, E. et al. (2020). Towards Query-by-Expression Retrieval of Cuneiform Signs. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 43–8. 10.1109/ICFHR2020.2020.00019.

Veldhuis, N. (2012). Cuneiform: Changes and Developments. In Houston, S.D. (ed.), The Shape of Script: How and Why Writing Systems Change. Santa Fe, N.M.: SAR Press, pp. 3–23.

Wang, Z. J. et al. (2021). Putting Humans in the Natural Language Processing Loop: A Survey. In Blodgett, S.L. et al. (eds), Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing. Online: Association for Computational Linguistics, pp. 47–52.https://www.aclweb.org/anthology/2021.hcinlp-1.8.

Yamauchi, K., Yamamoto, H. and Mori, W. (2018). Building A Handwritten Cuneiform Character Imageset. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA), pp. 719–22. https://aclanthology.org/L18-1115.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO