African Languages And Digital Humanities: Challenges And Solutions

panel / roundtable
  1. 1. Sara Petrollino

    Centre for Linguistics - Leiden University

  2. 2. Victoria Nyst

    Centre for Linguistics - Leiden University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The field of African language technology (De Pauw et al. 2011; Ndinga-Koumba-Binza and Bosch 2012; Amadou Dia 2014) has seen a rapid development in recent years, and several digital humanities projects and hubs have been established across the continent. Language documentation projects have focused on several endangered and minority languages, producing large digital corpora and data sets for African under-described languages. What is the place of African languages in the African digital landscape and what is the state-of-the-art of African digital scholarship? What are the challenges and the solutions for a DH approach in the field of African languages and linguistics? What are the good practices for building African-based repositories, language infrastructures and other digital capabilities? What is a sustainable model for the engagement of wider audiences and for digital capacity building in Africa? How can we address ethical issues and the “tension” between the trend towards open access and the need to protect privacy and property rights of community speakers and researchers?
The panel brings together scholars from different backgrounds (computational linguistics, natural language processing, language documentation and description) to answer these and other related questions and to share the experiences of scholars who are directly involved in DH research in Africa and in the management of African-based digital archives, repositories and infrastructures. In this way we hope to have a first bird’s eye-view of DH research for African languages, which will allow a critical discussion of the nature and future of the field.
Tunde Ope-Davies (Opeibi), Digital Humanities, University of Lagos
Reframing community building and civic engagement in L2 public sphere: A study of new media multilingualism in Nigeria democracy.
One striking effect of the ongoing digital revolution is the evolving reconfiguration of the public sphere in most socio-political jurisdictions. From Europe to the Americas, and from Asia to Africa, social media is revolutionizing communications and social networking activities, redefining the mechanics of our daily interactions in private and public spaces.
In the last one decade and more, reforms in most Telecommunication sectors and increase in internet penetration have positively impacted communication practices in Africa. In some of these democratic contexts, the phenomenal growth in computer-mediated communication has made the practice of democracy more participatory; creating a more virile public sphere. Citizens’ online activities have expanded due to rapid growth in internet penetration and the proliferation of social media platforms now accessible through various handheld devices, laptops, and more recently affordable smartphones.
This present study addresses gaps in online multilingual political discourse studies in Africa, using Nigeria as a case study. First, it examines the use of social media within the Nigerian socio-political context. Second, it discusses the extent to which social media platforms have provided tools and possibilities for participatory and inclusive democratic process.
Pivotally, the study focuses on how the emergence of new media multilingual mechanism reshapes online political conversation and social engagement in Nigeria. It studies how the use of English and some local languages (Yoruba, Igbo, Pidgin) in online posts help to facilitate community building, social mobilization, social networking, and foster civic engagement. Also, it discusses how these new technologies absorb/adopt some offline linguistic behaviour to inspire online social networking and participation in public conversation on current and topical political issues.
The data was drawn from the repository of the Corpus of Nigeria New media Discourse in English (CONNMDE), an ongoing Digital Humanities project at the University of Lagos. In eliciting the data deposited in the corpus, we utilized web-based corpus tools and applications to harvest relevant posts and chats from the websites, Twitter handles and Facebook pages of key political parties and political actors in Nigeria between July 2014 and October 2018 (the first phase of the project). Additional data was elicited from the online portals of three major national newspapers in Nigeria (Punch, Vanguard and the Nation).
Relying on theoretical insights from Computer Mediated Discourse Analysis (CMDA) and Speech Accommodation Theory (SAT), the study considers language choices mediated by new technologies as strong motivation for innovative discourse strategies being deployed in this context.
Among others, some of our findings suggest that new media platforms now accommodate and instantiate some offline
socio-linguistic behaviour found in second language English-speaking contexts in Africa. For instance, some features of Speech Accommodation strategies (e.g. Giles & Powesland, 1975) common in African socio-cultural contexts, and online socio-pragmatic discourse cues, now constitute a key component of communication strategies adopted in building online community and promoting civic engagement within the public sphere.

Emmanuel Ngué Um (University of Ngaoundéré, Cameroon)

The Asynchronous relationship between Digital Archives and the discipline of Linguistics in Africa

The dominant trends in present day Linguistics in Africa are both theoretical and applied. Theoretical research is geared towards either eliciting structural features of envisaged linguistic systems or the testing of grammatical theories which require little or no data for their operationalization. Applied linguistics is mostly concerned with language teaching and resorts to naturalistic language data only for the sake of illustration. Running contrary to this trend, Digital scholarship and language archiving operate on infrastructures that provide a integrated, data-conducive environment for the assembly, organization, processing and publication of research. As it is, therefore, current practices in Linguistics in Africa do not seem to be in vital need of Digital Archives. This presentation does not focus as such on the structural discrepancy between language archives and the discipline of linguistics in Africa. Instead I will attempt to interrogate the relevance of a century of academic endeavor (i.e. in the discipline of linguistics) with regard to the complexity of its object, namely Africa’s extensive multilingualism. One crucial question which needs to be asked, at this juncture, is to know the ultimate end of current linguistic research. What do linguists in Africa aim at when they set out to analyse aspects of the linguistic reality? What is the significance of the information that each piecemeal research produces? Are linguists in Africa working together towards achieving common goals or, has the discipline of linguistics become a routine in the academia which serves as justification for the continuity of an institutional business? Answers to the above questions may neither be obvious nor straightforward. However, most stake-holders will admit that our discipline, in Africa, does not pursue coherent objectives from one university to another. It could have been hoped that the spur of language documentation into Africa's linguistics over the past fifteen years and the lobby thereto of funding bodies such as DOBES and ELDP could have regenerated the linguistics scholarship in the continent. The increasing amount of data which has been harnessed from documentary projects and which are accessible via well-established and archives such as LAT (MPI-Nijmegen) and ELAR do not seem to have substantially impacted the scientific agenda of the discipline of linguistics in Africa so far. I will content in this presentation that, embracing digital archives and digital scholarship in the discipline of linguistics in Africa is tantamount to shifting from the current ad hoc, individualistic research paradigm, to communal scholarship. This entails definition of common research objectives and methodologies, from data collection and organization to data dissemination. This also entails mutualization of the scientific information and cross-verification of structural analyses from one named language data set to another, which would aim at pursuing a general understanding of the underlying experiences which justify the fact that, under apparent linguistic variation, there seems to be an abstract cultural reality which no individual grammar of a named language could help to uncover.

Moses E. Ekpenyong
Department of Computer Science, University of Uyo, Nigeria
Intelligent Humanities: Towards High Performance Applications for African Languages
Today’s society is witnessing the production of extreme large data sets (Bag Data) that has challenged traditional processing and storage methods. This sudden state of data explosion has indeed introduced major changes into existing data management processes, demanding a robust and sustainable solution, to efficiently process, analyze, store, share, and disseminate data into the future.
From experience, the processing mechanism can be structured following the standard data mining pipeline (digitization -> transcription -> pattern recognition -> simulation and inference -> preservation -> curation), and the degree of difficulty scales with the complexity and volume of data. Hence, massive digital objects in the humanities (e.g., large-scale corpus, images, unprocessed artifacts, audio, and videos) require suitable methods to guarantee useful extractions, for meaningful interpretations. In this contribution, we examine the role computer algorithms play in mining, shaping and representing data in the humanities. We propose an intelligent framework that maintains consistent applications that go beyond traditional methodologies, to reveal inherent patterns, trends, associations, and especially relates to human behavior and interaction – the Big Data experience.
Linking the past and the future constitute our Digital Humanities efforts in the University of Uyo, Nigeria – a product of an interdisciplinary cooperation between the Computer Science and Linguistics and Nigerian Languages departments. The first stage of the proposed framework is predicated on cross-domain metadata, and metadata mapping – which pose the problem of connecting existing metadata to embedded links. The key challenge however, is the metadata mapping – as heterogeneity of inputs complicates one-to-one mapping, and harmonization of the metadata and ontologies appear intractable. As such, experience and best practices are mandatory when transforming and consolidating formats into internal knowledge representation, for clustering and reasoning. Learning from existing data constitutes the second phase of our framework that drives the intelligence (imposes cluster patterns, and reasoning), required to enhance the classification process – for accurate prediction and visualization of the data sets. This at the end maximizes the use of the digital resources. Further, dissemination of the resources can be achieved through a Web Management Interface (WMI), after proper ethical and copyright procedure has been followed.
An implementation of the proposed framework to salvage a critically endangered language, ‘Medefaidrin’ is demonstrated in this paper. A multilingual application that embeds intelligent techniques for the analysis and visualization of prosodic features of selected West African languages: Medefaidrin (Artificial, Nigeria), Ibibio (New Benue Congo, Nigeria), Igbo (Benue Congo, Nigeria), Yoruba (Niger Congo, Nigeria), and Hausa (Afro-Asiatic, Nigeria), is developed. The speech corpus adopted is the Ibadan 400 words – a list of basic (English) lexical items of any language, translated and recorded in the various languages. The developed application is Web-based and enables useful pattern discovery that reveals the dynamic nature of these languages, as well as aid the pronunciation and simulation of sentences – a necessary Computer Aided Learning (CAL) tool.
Juan Steyn, South African Centre for Digital Language Resources & Digital humanities Association of Southern Africa
Building Sustainable Digital Infrastructures

The South African Centre for Digital Language Resources (


) is a new research infrastructure (RI) set up by the Department of Science and Technology (DST) forming part of the new South African Research Infrastructure Roadmap (



The centre, which is currently still in its incubation phase runs two main programmes. A digitisation programme and a Digital Humanities (DH) programme.

The digitisation programme focusses on the creation text, audio and multimodal datasets as well as the development of NLP tools and software for the 11 official languages of South Africa.
The DH programme focusses on enabling and promoting the use of digital data and new methodological approaches within the broad Humanities and Social Sciences.

The establishment of the
SARIR programme, which aims to provide "a high-level strategic and systemic intervention", provides South African researchers with a unique opportunity to develop and foster new research fields and possibilities.

This paper will share some of the challenges experienced during the current incubation period as well lessons learned by specifically reflecting on:

What to keep in mind when building a new language related RI.
Providing access to resources and tools through a RI is not enough. Why catalysing and sustaining contextualised capacity development is very important.

Where to find collaborators and training partners. The value of engaging and working with the international

Software, Data and Library Carpentry


Successes in promoting DH approaches in African language teaching, learning and research contexts for faculty and students.
Stories on re-defining DH from a Southern African perspective.

Setting up a RI is only part of the process. The end goal is to ensure that the wider research community is able to not only access resources and tools but enabled to critically apply new skills to address the challenges of tomorrow.

Felix Ameka, Leiden University

Ethical issues in digital data collection and exploration
All forms of research in the language sciences critically depend on data collection from language users. Increasingly and universally data collection methods and processing are digital. Moreover, there is the growing realisation that to understand language practices we cannot ignore the visual mode of language. This can only be captured through digital video. Furthermore, researchers are increasingly under pressure to make their data openly accessible. However, such digital data carries representative of data providers. How can we ethically go about these? In this talk, I will raise the ethical challenges involved in collecting, processing, curating, storing and exploring different forms of data. How can we carry out these activities paying attention to the principles of justice, and of beneficence/maleficence, i.e. practices should not be harmful to research participants


Amadou Dia Ibrahima. (2014).

De Pauw, G., de Schryver, GM., Pretorius, L. et al. (2011). Introduction to the special issue on African Language Technology. In
Language Resources and Evaluation 45:263.

DHASA (2017). Abstracts.

DHASA (2017)
. The Southern African context current activities and projects.

DST (2016). South African Research Infrastructure Roadmap.

Giles, H. and Powesland, P. F. (1975).
Speech style and social evaluation. New York: Academic Press.

Ndimele, Ozo-mekuri. (2016).
ICT, globalisation & the study of languages & linguistics in Africa. Port Harcourt: M & J Grand Orbit Communications.

Ndinga-Koumba-Binza, S. and Sonja E. Bosch. (2012).
Language Science and Language technology in Africa: Festschrift for Justus C. Roux. Stellenbosch: SUN MeDIA

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.