Confronting Complexity of Babel in a Global and Digital Age

Gregory Ralph Crane; Neven Jovanovic; Sophia Sklaviadis; Margherita de Luca; Petra Šoštarić; Maryam Foradi; Kate Cottrell; James Tauber; Farnoosh Shamsian; Chiara Palladino

Authorship

1. Gregory Ralph Crane

Tufts University
2. Neven Jovanovic

University of Zagreb
3. Sophia Sklaviadis

Tufts University
4. Margherita de Luca

Sapienza Università di Roma (Sapienza University of Rome)
5. Petra Šoštarić

University of Zagreb
6. Maryam Foradi

Leipzig University of Applied Sciences
7. Kate Cottrell

Tufts University
8. James Tauber

Eldarion.com
9. Farnoosh Shamsian

University of Tehran
10. Chiara Palladino

Furman University

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This panel describes work that has been and is being done to address the complexities of working with a historical record that contains far more languages than any individual could study, much less master. Individuals can realistically develop proficiency in no more than a handful of the languages, contemporary and premodern, from the current and surviving human record. DH2019, for example, despite its international community, warmly invites submissions in languages other than English but can only offer “a sufficient pool of reviews” for papers in English, French, German, Italian, and Spanish. Difficult as it is to support such a multilingual culture of five modern European languages, it is not practical for most researchers to learn, in any serious way, the additional nineteen official languages of the European Union, much less the twenty-two official languages of India, and/or Chinese, Arabic and other languages with tens of millions of speakers. Students of the historical record also need to consider how to think about Classical languages more broadly (if we choose to maintain the category of classical at all) such as (to name only a few from the Eurasian cultural space) Classical Chinese, the Cuneiform languages, the various pre-modern forms of Egyptian, Syriac, Classical Arabic, Old and Middle Persian, or the various pre-modern forms of contemporary European languages -- India alone officially recognizes six classical languages. The papers in this panel review research and development of digital tools to explore new forms of digitally enhanced reading and language learning.

How can new digital services allow you to work with languages that you do not know? This paper reports work done on and off over many years and particularly over the past five years to help students work directly with source texts in historical languages of which they have little or no knowledge. The paper also introduces general questions that the remaining papers explore in detail.

What can you produce and what can you learn when aligning a translation to a language that you have not studied? This paper looks at a different historical and modern language pair than those in the first paper and provides data on two topics: the quality of data that non-experts can produce and what the non-experts actually learn while producing the data.

What are the limits of translation? How well can we actually translate emotions from an ancient language? This paper explores ways that modern readers can get at core meanings for a complex semantic field such as emotions for historical languages.

How do you build on initial interactions to develop a systematic understanding of a language?

How do you bootstrap study of one language in a new language with few, if any, resources? This paper reports on complementary efforts to bootstrap, and on potential models of collaboration for, the study of the same European classical language in a less commonly spoken European language and in a non-European language.

The final paper examines Schlegel’s Latin translation of the Bhagavad Gita and explores how the digital tools discussed in this panel are being applied to making the relationship between Sanskrit source text and Latin translation visible to fundamentally new audiences.

1. How can you work with a language that you do not know?
This paper describes how readers have worked with source texts in a language that they did not know. The experiences in this and similar classes have not only motivated work in the other papers on the panel, while the results presented in the subsequent panel papers are already shaping plans for future classes. The particular context was a course on ancient literature in a non-Roman alphabet via modern language translation. This use case is typical not only of many pedagogical contexts but also of much academic research -- World Literature, for example, regularly focuses on translations (usually in English) in order to engage with texts in many different languages while many researchers in fields such as the History of Science -- and History as a whole -- must rely upon translations to examine primary sources in a wide range of languages. The digital tools are relatively simple and have not yet incorporated more advanced methods (such as word/phrase alignments between source texts and translation or exhaustive, curated annotations explaining the function of each word in the source text): students had side-by-side source text and translations and the ability to call up dictionary entries for most words in the source text. The experience of undergraduates with relatively simple digital tools and no advanced training in the Humanities provides a useful demonstration of what is possible. We present here initial impressions from assessing the student performance.

The course has been taught four times with enrollment ranging from c. 12 to 25. It includes three projects: (1) a comparison of two modern language translations against each other and against the source text in the unknown original language: (2) analysis of the semantics of words in the original language that lay behind terms such as “love” and “holy” in the modern language translation; (3) intensive analysis of a single passage in the original, mastering as much of the linguistic, metrical and stylistic content as possible. A number of students found the premise initially daunting, if not implausible, but the vast majority of students discovered that they could indeed execute all three assignments. Students regularly recognized which translation was freer but often made critical assessments that favored the freer translation on literary grounds. The students -- especially during oral presentations -- conveyed both an acute sense of how limited their understanding was as well as a critical -- and sometimes astonished -- recognition of how much they could indeed understand. Likewise, they developed a tangible sense for the fluidity of translation when they saw how the same words were translated and explored cultural semantics on their own. Many, if not most, came away with a fundamentally different view of what translations did and did not offer and a recognition that they could push beyond the surface of the translation and understand the source text in ways that they had not thought possible.

This paper presents examples of work from student projects and our assessments of what did and did not work. Subsequent papers will advance the questions and challenges raised here.

2. What can you produce and what can you learn when aligning a translation to a language that you have not studied?
This paper reports the results of two experiments investigating a) whether users with no knowledge in Persian are able to align the source text with its translation at word level and b) what the users actually learned about the language in performing the alignments. This effort frames alignment as a Citizen Science project, in that participants, not only contribute data but also develop their own skills. This paper builds upon the results from the course described in paper 1 but applies the alignment task to two different languages. In this experiment, German students used an English translation aligned to a Classical Persian as scaffolding to align the source text with a translation in their native language.
The first part of the study measured accuracy of alignments between source text and translation. First, two automatic alignment systems, Giza++ and the Berkeley Aligner, provided a baseline alignment between the classical source text and the German translation. Second, graduate students who were expert in both Persian and German performed the alignment manually. Third, graduate students who were expert German but had no knowledge of the Classical Persian aligned the texts. Both groups of human annotators outperformed the automated systems. To our surprise, participants with no knowledge of Classical Persian produced alignments of the same accuracy as experts in that language. Both groups thus provided data that was useful in itself and as training data for improving automated alignment. The study thus provides data about the potential quality of such contributions by non-experts (a common concern among traditional scholars).
The second part of this investigation explored what (if anything) of Classical Persian the non-expert contributors learned in creating the alignments. In particular, we compared incidental learning during alignment and vocabulary learning with flashcards. The experiment monitored the learning success of each method by an immediate post-test and two delayed post-tests after two weeks and two months. Evaluating the test results reveal that the typical forgetting curve occurs for the vocabularies learned by digital flashcards, whereas despite the lower level of immediate learning of vocabularies using translation alignment, the recall rate has a minimal decrease after two months. Ultimately, we found that participants remembered words best when they combined explicit practice through flashcards and incidental exposure using translation alignment.
The non-expert contributors developed a more general but very concrete sense about the limitations of working with translations of the classical language, especially when they were able to see, with considerable precision, where and how translations deviated from the source texts. In addition to the impact of the translation alignment on the process of learning vocabularies, it helps readers to compare the translation with another translation and with the source text.

3. Ancient emotions: semantic alignment and translatability
The variation in conceptions of categories of emotions across cultures and languages undermines both universal and innate theories of emotion concepts (Russell 1991, Wierzbicka 1994, Barrett 2017). Emotion words, as labels for emotion concepts, communicate culturally held notions about values and goals. Thus, the organization of the language of emotions plays a key role in decoding affective experiences and understanding deeper social structures (Santangelo 2007). The domain of emotion concepts shows poor semantic alignment across languages, demonstrating how languages reflect the organization of the external world and carve up conceptual space (Thompson et al. 2018). Given the importance of formal characterizations of meaning to theories of emotions, the aim of our project is to evaluate the extent to which different modern languages have access to emotion concepts conveyed by texts in historical languages. Our approach focuses on the analysis of a corpus of translations to assess how the domain of emotion words is linguistically structured and how it behaves across languages. We use two canonical works in a historical language and their translations in eight Indo-European languages [references removed for anonymity but to be replaced if the panel is accepted].
We model cross linguistic semantic alignment via word-embeddings, using skipgrams (Mikolov et al. 2013) trained on the multilingual corpus described above. We expect semantic alignment of emotion words to be predicted by a measure of the phylogenetic distance of languages: languages that are historically more closely related will show higher semantic alignment in the domain of emotions compared to phylogenetically more distant languages (Thompson et al. 2018). Phylogenetic distance between language pairs is estimated using a Bayesian spatial diffusion model (Bouckaert et al. 2010, 2012). A new aspect in our model is that we use as a second explanatory variable for predicting semantic alignment the frequency of repeating n-grams: in this way we control for repetition of word sequences and balance the poetic structure of the texts. Our model of multilingual semantic alignment of emotion words in the ancient text requires a significant amount of alignment data that can be used by digital dictionaries and textbooks for the historical language. Moreover, the work on fine grained syntactic dependency, and POS annotation that is available in XML for the historical corpus may be generalized semi-automatically to aligned translations.

4. How do you build on initial interactions to develop systematic understanding of a language?
What linguistic knowledge is necessary to understand a text in a pre-modern, historical language and what knowledge of a language has a student gained once they understand a text? How can this knowledge then be used in selecting other appropriate texts for the student and in providing the necessary scaffolding to fill the gaps?
This paper looks at issues in overall language modeling and the linguistic annotation of texts for this purpose with a particular focus on vocabulary and inflection but with some consideration given to syntactic constructions as well.

Typically vocabulary ordering in historical language instruction is driven by frequency, although in many traditional grammar-translation approaches, inflectional class is also a key factor. For example, a particular noun declension might be introduced and then high-frequency vocabulary from that declension taught. As will be shown, this leads to a far from optimal ordering in terms of being able to read real text early on.

Instead the paper explores allowing the texts, chunked to appropriate sizes, to algorithmically drive the ordering of teaching vocabulary and inflection. And by tracking what the student has already seen and what they need to see more of, a structured sequence through new passages of text can be developed, guiding them through specific texts that reinforce what they have already seen while introducing new vocabulary and grammar in appropriate increments.
We will discuss, using the Greek texts of the New Testament and of Homer, how inflectional morphology, lexical relatedness and the choice of target unit larger than an individual word impact the optimal ordering of the introduction of new vocabulary.

Finally we demonstrate an online adaptive reading environment prototype that, backed by both annotation of the text being read and a model of student knowledge, provides assistance in vocabulary, morphology, and syntax while both explicitly assessing knowledge and also implicitly tracking their inquiries when seeking additional help.

5. Bootstrapping the study of an ancient language
This paper describes two different approaches to a shared problem. Each speaker seeks to teach Ancient Greek language but each needs to teach that language in a modern language with few resources. One of the modern languages has long and deep ties to the historical language but traditionally depends upon translations of grammars and reference works from the larger academic languages. The second language, Persian, comes from a very different cultural sphere but many of the resources about the history of Iran and Iranian cultural heritage are originally in Greek. There are virtually no resources in for Ancient Greek in Modern Persian-- even the translations that do exist are indirect, being derived from translations of the historical sources and not from the sources themselves. Each of the speakers has exploited digital methods to bootstrap the study of the classical language in their national language and both explore the possibilities for speakers of less well-supported modern languages to develop a more localizable infrastructure for the study of Ancient Greek and other historical languages.

The first case study focuses upon teaching the European classical language in a less widely spoken European language. Although the educational system in the European nation makes it possible to learn the classical language from primary school to MA degree, the few textbooks and reference works available in the local language are seriously outdated. Producing and publishing a new one would be a long and complicated process, therefore using available digital tools is the next best thing when it comes to teaching the classical language in this modern language. In the last few years, several courses at the local university included experimental use of morphological annotation, treebanking and text alignment (aligning texts in one or more historical languages with translations). Neither treebanking nor text alignment can be done properly without a careful lexical and syntactical text analysis, so this has helped identify some of the problems students face, especially when it comes to syntax.

The second case study focuses on teaching Ancient Greek to Persian speakers in Iran, a non-Western country with very different cultural traditions. This paper describes bootstrapping the study of Ancient Greek with similar open educational resources and digital libraries with complex metadata to those exploited in the European context. Like its European counterpart, this effort includes comparisons between Greek and Persian grammar to give the learners a clearer understanding and facilitate the learning process. Also, it uses translation alignment of simple sentences to gradually prepare the learners to face the original texts and help them with basics of the syntax.

The paper concludes with a discussion of how such originally separate efforts, in very different cultural contexts, can leverage linked open data and common guidelines to serve each national audience more fully, establish intellectual ties between their students and hopefully spur additional study of historical languages in other modern languages.

6. How have users of one classical language engaged with another? Schlegel, Sanskrit, and Latin

In 1823 August Wilhelm von Schlegel (1767-1845) published a Latin translation of Bhagavad Gita, a philosophically important Sanskrit text, part of the epic poem Mahabharata. Schlegel, a German by birth, chose Latin to translate into because he considered it "not encumbered by particles, articles, pronouns, and auxiliary verbs", and therefore closer to Sanskrit in structure and in style, and also esthetically. At the same time -- although he does not say so -- the translator chose Latin because it was a language familiar to all Western scholars. Traces of Latin as a European cultural language remain even today: in many European countries it is still taught in schools.

Building on that remaining presence of Latin, and intending to demonstrate how one historical language -- structurally, stylistically, culturally different from modern modes of expression -- can serve as a bridge to another, even further removed historical language, a team of undergraduate students and their teachers (some members of the team had both Latin and Sanskrit, and others only Latin) developed a course around a digital Sanskrit-Latin edition of the Bhagavad Gita. The task also gave us an opportunity to assess the state of digital philology of Sanskrit and Latin: which resources are available, which are (freely) reusable, how to connect what we have, how to supply what we lack, do we have to be IT experts to do it?
Our digital Bhagavad Gita consists of a Sanskrit version in Devanagari, aligned to a romanized version, which is again aligned to Schlegel's Latin version. A canonical reference system has been added. In both Sanskrit and Latin versions words were morphologically annotated and lemmatized, using existing digital resources. Links to existing digital dictionaries were provided as well. Linguistic annotations, together with a concordance, served as the basis for producing vocabulary exercises, presented as flashcards. A standard open-source learning platform was used to connect all resources and to guide and assess students’ progress.
A digital, linguistically annotated, connected, and pedagogically structured Sanskrit-Latin Bhagavad Gita offers students of Latin an opportunity to use what they already know to interact with another historical language and, through it, with another cultural tradition. The dialogue about the two-thousand-year old Hindu and Roman worlds is multilateral - 19th century scholarly culture and Schlegel took part in it while the 21st century digital philology builds on this earlier work as it forges its own views. Additionally, work on the course has shown that many of the resources we needed were already available - but also that, in the field of historical languages, two of the hardest challenges for digital humanities - and two significant obstacles to its widespread cultural and social acceptance - are connecting resources and tools, and making sure that we are allowed to reuse them.

Bibliography

Barrett, L. F., Ph d (2017).
How Emotions Are Made: The Secret Life of the Brain. Boston: Houghton Mifflin Harcourt.

Bouckaert, R., Lemey, P., Dunn, M., Greenhill, S. J., Alekseyenko, A. V., Drummond, A. J., Gray, R. D., Suchard, M. A. and Atkinson, Q. D. (2012). Mapping the Origins and Expansion of the Indo-European Language Family.
Science,
337(6097): 957–60 doi:10.1126/science.1219669.

Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. H. (2010). WEKA—Experiences with a Java Open-Source Project. ,
Journal of Machine Learning Research(11): 2533–41.

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.
ArXiv Preprint ArXiv:1301.3781 http://arxiv.org/abs/1301.3781 (accessed 29 January 2019).

Russell, J. A. (1991). Culture and the Categorization of Emotions.
Psychological Bulletin,
110(3): 426–50.

Santangelo, P. (2007). Emotions and perception of inner reality: Chinese and European.
Journal of Chinese Philosophy,
34(2): 289–308 doi:10.1111/j.1540-6253.2007.00414.x.

Thompson, B., Roberts, S., Roberts, S. and Lupyan, G. (2018). Quantifying Semantic Alignment Across Languages.
Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018). pp. 2551–56.

Wierzbicka, A. (1992). Talking about emotions: Semantics, culture, and cognition.
Cognition and Emotion,
6(3–4): 285–319 doi:10.1080/02699939208411073.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019

"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html

References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html

Series: ADHO (14)

Organizers: ADHO