Digital documentation of lesser-known languages in India: Its application in teaching Linguistics

poster / demo / art installation
Authorship
  1. 1. Anju Saxena

    Linguistics - Uppsala University

  2. 2. Udaya Narayana Singh

    Central Institute of Indian Languages

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The most important relationship between language and culture that gets to the heart of what is lost when you lose a language is that most of the culture is in the language and is expressed in the language. Take it away from the culture, and you take away its greetings, its curses, its praises, its laws, its literature, its songs, its riddles, its proverbs, its cures, its wisdom, its prayers. The culture could not be expressed and handed on in any other way. (Fishman 1996: 81)
The aim of this presentation is to discuss how digital documentation of lesser-known languages in India (where language corpora are part of the documentation) can be an effective tool in language learning/teaching Linguistics. Combining the use of foreign language corpora together with the use of ICT provide effective pedagogical tools in renewing language teaching/learning. These two points will be discussed and illuminated by presenting the foundations and demonstration of a research project (Digital documentation of Indian minority languages) of which Anju Saxena is the Principal Investigator and Udaya Narayana Singh is the Principal Collaborator.
The increasing internationalism in the twentieth century with a small group of nations dominating the scene, has had an adverse effect on the maintenance of social and cultural traditions of many communities. According to Krauss (1996), 3000 of today's 6000 languages will disappear in this century, if no extra measures are taken. Issues relating to language death, endangernment and threat to linguistic diversity have dominated the scene (Krauss 1996) and efforts to revitalize endangered languages and to regress the phenomenon of language death have been the themes of several conferences (including UN conference).
A language is a reflection of the community that speaks it. It embodies the philosophy and the world-view of its people. In communities which lack a writing system, this knowledge is handed down orally from one generation to the next. When a language dies, which happens with increasing frequency in our modern world, we lose not only the linguistic knowledge of that community, but also the knowledge about its culture. One important way of ensuring that the knowledge about indigenous languages and cultures is not totally and irrevocably lost is by documenting these languages and by spreading information about languages and cultures of these communities to a wider audience.
Recent developments in internet technology have the potential to enormously change the way that we collect, store, organize, analyze and disseminate linguistic data. Internet provides opportunities for producing as well as making the material available to a larger audience cost-effectively with the possibility for updating the material. The focus in language technology has, unfortunately, been on major Western languages until recently. There is a growing awareness in the research community that these technical advances can and should also be used in documenting minority languages as this could be an effective tool for spreading awareness about these languages and for maintaining linguistic diversity (Ó Cróinín 2000). It should, at the same time, also be highlighted that it is not only the minority languages which stand to gain from this collaboration, rather also the domain of language technology itself - it gains a testing ground for evaluating its tools and programs on languages which differ significantly (typologically) from literate Western languages.
The Indian subcontinent has a long history of linguistic diversity and multilingualism, spanning more than three millenia. Languages spoken in this region belong to four major language families: Indo-Aryan, Dravidian, Tibeto-Burman and Munda. Societal multilingualism is an established tradition in India. Despite this stable multilingualism, language death is not uncommon in the Indian context. We are, at present, witnessing some positive efforts in documenting lesser-known languages in South Asia, using the internet technology. LACITO (Boyd Michailovsky) in France has initiated an arhive which contains texts of lesser-known languages (including some languages of Nepal) http://ldc.upenn.edu/exploration/expl2000/papers/michailovsky/index.html. The Digital Himalayan Project (http://www.digitalhimalaya.com/) at Cambridge University is an attempt to electronically document anthropological material. The Indian side of the Himalayan region has, unfortunately, been out of sight as far as the application of recent technological advances for linguistic documentation is concerned. The aim of our project is to fill this void. The digital documentation in this project includes not only the linguistic documentation (i.e., texts), but also material which will anchor this linguistic material to social and cultural lifestyles of these communities. Digitalized audio and video recordings are necessary components of the documentation. Documentation of each language will include the following:
A brief desription of its genetic affiliation and major typological characteristics;
An outline of its sociolinguistic situation (e.g., size of the speech community, to what extent is the community monoligual, literacy);
Photographs and video recordings to place the linguistic material in its perspective;
A brief description of previous works on this language and culture;
A description of the grammatical terminology, abbreviations used and what they stand for, and on the transcription convention used;
Direct-elicited data on lexical semantics, kinship terms, numerals, paradigms (inflectional and derivational morphology);
An annotated narrative corpus (together with audio- and visual-recording) with the following information:
a unique reference field
a phonetically transcribed unit, such as a clause;
the morphological representation of the clause;
a morpheme-by-morpheme translation of the clause into English;
a free translation of the clause into English
The results of this project will directly as well as indirectly help towards maintaining linguistic diversity (thus regressing the phenomenon of language death) by documenting these languages and by spreading awareness about these lesser-known languages. Further, it will also contribute to our understanding of typological characteristics of languages of this region and of the interplay of socio-cultural factors and linguistic structure. Furthermore, it will also provide rich material for teaching Linguistics courses.
There has been a growing interest in using natural language corpora in teaching and in research, partly due to the growing availability of computer-readable linguistic corpora, and partly due to an increase in examining language in its natural context as opposed to investigating constructed language examples in isolation. This accompanying physical context makes it possible to investigate the textual, discourse-level, functions of the grammatical phenomena. Researchers, teachers and students now have access to different types of language corpora to discover facts about language; for example, which words are most frequently used in a language or a language-type, in which context they predominantly occur and which grammatical patterns are associated with a particular linguistic item (Ghadessy et al 2000).
Another advantage of using corpora in teaching is that instead of learning about linguistic theories in vacuum, students themselves have a chance to test these theories against these corpora and learn about these theories or concepts for themselves. When corpora are used by students as part of their learning, the distinction between teaching and research is "blurred", as students, by discovery proceedure (thus, research), learn things for themselves (thus, learning/teaching) (Knowles 1990).
There are several advantages in using language corpora of lesser-known languages:
It brings research findings in classroom teaching.
It has been our experience that it works better when the morpho-syntax of Swedish (mother tongue of most students in Sweden) is compared with some other language. For this purpose, Kinnauri or any other language of digital documentation project is a good candidate, being different typologically from Swedish considerably. It provides a way for students to appreciate the differences in grammars of the two languages and also something to think about concerning the similarities that exist between/among languages.
Working with a corpus provides an opportunity for students to work in groups.
It helps also spread awareness about lesser-known languages.
ICT (Information and Communication Technology) provides good resources for storing and disseminating the primary material. By having archives of spoken and written material of lesser-known languages we preserve knowledge of and about the language for the next generation, where for many languages there will be no native speakers of that language alive.
References
1. Ó Cróinín, Donncha (ed.) 2000. Workshop Proceedings. Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000. Second International Conference on Language Resources and Evaluation. Athens.
2. Fishman, Joshua. 1996. What do you lose when you lose your language? In: Gina Cantoni (ed) Stabilizing indigenous languages. Flagstaff: A Center for excellence in eduaction monograph. North Arizona University, pp.80-91
3. Krauss, Michael. 1996 Status of native American language endangerment. In: Gina Cantoni (ed) Stabilizing indigenous languages. Flagstaff: A Center for excellence in eduaction monograph. North Arizona University, pp. 6-21
4. Ghadessy, Mohsen, Alex Henry and Robert L. Roseberry. Introduction. In: Ghadessy, Mohsen, Alex Henry and Robert L. Roseberry (eds.) Small corpus studies and ELT. Theory and practice. Amsterdam/Philadelphia: John Benjamins.pp xvii-xxiii.
5. Knowles, Gerald. 1990. The use of spoken and written corpora in the teaching of language and linguistics. Literary and linguistic computing. Journal of the Association for literary and linguistic computing. 5.1.45-48.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004

Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None