Digital Language Archives and Less-Networked Speaker Communities

  1. 1. Lise M. Dobrin

    University of Virginia

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

During the past decade, digital language archives have flourished as the field of language documentation has entered mainstream linguistics. Major international funding bodies, both public and private, support research on endangered languages, and it is now possible to publish practical and theoretical work on endangered languages and documentary methods in dedicated journals and book series. One of the themes that has emerged most clearly in this literature is that of collaboration. It is now widely agreed that language documentation should be equally responsive to both the technical questions posed by linguists and the more immediate practical interests of speakers and their communities. Issues of rights and access are no longer anxious afterthoughts; they are fundamental matters for negotiation between researchers and speakers, mandatorily addressed in research agreements and funding proposals, and threaded through documentation projects from their very conception (see, e.g., Yamada 2007; Czaykowska-Higgins 2009).
Yet paradoxically, this intense focus on collaborative methods seems to stop where the digital archiving of language data begins. Language archives are doing their best to ensure that source communities can in principle gain access to the language materials they produce, but they are only just beginning to consider possibilities for formally integrating speaker communities into the process of archive curation. Such involvement could potentially transform language archives from repositories of static objects into sites for ongoing dialogue and exchange with living communities.
The obstacles to collaborative archiving are particularly acute in the case of source communities located in hard-to-reach places – often the third world – where many of today’s small, minor, and endangered languages are spoken. As might be expected, a disproportionate percentage of documentary projects are being carried out in such locations (see, e.g.,, with their outputs being deposited into first world archives. It is imperative that western scholars and institutions be cognizant of the impact their documentation projects have on such communities. In Melanesia, which is arguably the world’s most linguistically diverse area, local value is validated through relationships with outsiders. If community input ceases once the fieldwork phase of a western-sponsored project is over, this can reinforce the feelings of marginalization that motivate language shift in the first place (Dobrin 2008).
In April 2012, the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) is hosting an international group of scholars, technical experts, and community members for an intensive two-day meeting, sponsored by the National Endowment for the Humanities, to explore appropriate social and technical models for facilitating the ongoing involvement of less-networked source communities in the digital archiving of their endangered languages. Participants will present on the current state of the art in language archiving, the cultural and infrastructural situations of representative world regions (Papua New Guinea and Cameroon), and promising ‘bridging’ technologies. This paper will report on the results of the meeting, describing their implications for language archiving and for the digital humanities more generally.
There are a number of reasons why digital language archives must begin to find ways to support direct, ongoing relationships with speakers and community members, and not just with data depositors or researchers. One of these is the unfolding nature and experience-dependence of informed consent. When speakers are first recorded, they may consent to certain levels of access for the resulting materials (full access, access with conditions, researcher-only, etc.; see and But they cannot possibly foresee all potential future uses of the recordings. And where communities have little or no familiarity with the medium of distribution, how can they be expected to grasp even the most basic uses to which their material will be put? Given the opportunity to make their wishes known, judgments made by communities at a given time may be modified, refined, or even reversed later.
Developing direct lines of communication between archives and communities will also improve the quality and discoverability of archived data. Changing circumstances may make possible the engagement of knowledgeable stakeholders who were formerly reluctant or inaccessible. Also, though linguists may be diligent about collecting detailed metadata when making recordings in the field, this process inevitably leaves gaps. After fieldwork is over, missing information, such as the identity of a speaker or dialect, might only be known by difficult-to-reach community members. Whole categories of metadata that once seemed irrelevant (details of local history, relationships between consultants, and so on) often take on new significance as projects evolve. By enabling mechanisms by which community members can identify and attach metadata to recordings that concern them or their communities, endangered language resources will be enriched, making them easier to find and more useful for all users.
Finally, there is a practical need for the kind of direct relationships we are proposing to facilitate, as archives are now receiving expressions of interest from individuals who speak the languages of their collections. At the University of London’s School of Oriental and African Studies (SOAS), the Endangered Languages Archive (ELAR) has registered speakers of Bena and Pite Saami as users; IATH occasionally receives queries from town-dwelling Arapesh people interested in the Arapesh Grammar and Digital Language Archive (AGDLA). ELAR depositors sometimes specify that access to their deposit requires community approval; the Archive of the Indigenous Languages of Latin America (AILLA) sets up ‘community controlled’ as a systematic level of access (though at present this must be mediated through the institution). Increasingly, we can expect archives to be routinely receiving user account applications from individuals who were either directly involved as research participants or who have ancestors, family members, friends, or community associates who were involved in language documentation projects. But at present there is a clear disconnect between what depositors are able to achieve using the tools and systems provided by archives, and what community members are able to do – especially those with limited internet access.
Some digital language archives recognize these problems and are seeking to overcome them. Edward Garrett of ELAR has been experimenting with social networking technologies such as the open-source Drupal Content Management System in order to make their archive more user-centered; a desire to extend the ELAR system to include community members is one of the motivations for our meeting. The project BOLD-PNG (see has begun putting digital recording equipment into the hands of community members in Papua New Guinea and training them in techniques of basic oral language documentation, giving people the resources to become data collectors, if not depositors. Kimberly Christen’s Mukurtu platform was designed to facilitate community-controlled archiving, digitally instituting traditional cultural protocols to manage access to archived objects (see, e.g., Christen 2008). But none of these projects develops a generalizable approach to integrating communities without internet access in the ongoing curation of digital language materials they have produced.
We are particularly interested in the possibilities afforded by the increasingly common presence of mobile communication technologies in areas where basic infrastructure such as electricity, running water, and even roads remain absent. In such areas, new social institutions involving cell phones have begun to evolve, for example, conventions for exchanging cell-phone minutes, or new gathering spaces defined by signal access. Exploiting this new form of technology, the non-profit organization Open Mind has developed a system called ‘Question Box’ (see which allows remote communities to access information in critical domains such as health, agriculture, and business. Google’s voice-based social media platform SayNow is being used to allow cell phone users in Egypt and elsewhere to leave voicemail messages that appear online immediately as Twitter audio feeds ( We believe that similar methods are worth exploring as a means to creatively connect less-networked language communities with researchers and archives.
Christen, K. (2008). Archival Challenges and Digital Solutions in Aboriginal Australia. SAA Archeological Recorder 8(2): 21-24.
Czaykowska-Higgins, E. (2009). Research Models, Community Engagement, and Linguistic Fieldwork: Reflections on Working within Canadian Indigenous Communities. Language Documentation and Conservation3(1): 15-50.
Dobrin, L. M. (2008). From Linguistic Elicitation to Eliciting the Linguist. Language 84: 300-324.
Yamada, R.-M. (2007). Collaborative linguistic fieldwork: Practical application of the empowerment model. Language documentation and conservation 1(2): 257-282.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2012
"Digital Diversity: Cultures, languages and methods"

Hosted at Universität Hamburg (University of Hamburg)

Hamburg, Germany

July 16, 2012 - July 22, 2012

196 works by 477 authors indexed

Conference website:

Series: ADHO (7)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None