South African Centre for Digital Language Resources (SADiLaR)
South African Centre for Digital Language Resources (SADiLaR)
IsiXhosa is an Nguni language classified in the south-eastern geographical zone of South Africa (Guthrie, 1971:33). It is one of the official South African languages, and one of the most widely spoken (after isiZulu) with approximately eight million mother-tongue speakers. In terms of natural language processing, particularly computational morphology, the Nguni languages including isiXhosa belong to the lesser-studied languages of the world and can be classified as under-resourced languages. Nguni languages are characterised by a rich agglutinating morphological structure, based on two principles: the nominal classification system and the concordial agreement system (Bosch & Pretorius, 2008:97).
The principal author’s masters study focused on the representation of women protagonists by male and female authors in isiXhosa dramas. The whole analysis process was done manually, mainly because digitised isiXhosa literature books were not available. This limited the study to only four books. The analysis focused on gender inequality in the way women are represented by male authors, as opposed to the way in which women authors represent women protagonists, and also on patriarchal traces found in isiXhosa dramas.
When examining the representation of female protagonists in isiXhosa dramas, similar works from other scholars are noteworthy. The first contribution that narrates the same viewpoint as the one investigated here is by Ngqase (2002), in which she examines the representations of women in four isiXhosa drama books. The study highlights the interplay between culture and women's social space. The second contribution by Peter (2010) expresses female character portrayal in various drama works written by males. He concludes that many male writers are unwilling to portray female characters in their totality and true complexity, which is evident in the way some writers have resorted to the use of stereotypes (Peter, 2010:15).
As an isiXhosa language researcher at the South African Centre for Digital Language Resources (SADiLaR), the principal author has been introduced to computational methods which could afford new ways to approach the research topic described.
Assessing and reporting on the usability of computational tools when analysing isiXhosa texts
This presentation reports on the same research topic, with the focus on computational methods to analyse the texts instead of manual approaches. The computational tools which were utilised include, Voyant Tools and regular expressions (regular expressions) as well as testing the feasibility of BookNLP when used conjunctively with written languages.
The creators of
Voyant Tools
note that it supports analysis in any language since it mostly operates on character sequences; however, limited language-specific support is available (Sinclair & Rockwell, 2019). Capitalisation in isiXhosa is of special importance in the proposed study as the language follows a pattern where the second letter of a word is capitalised instead of the first. For instance, the “Context” tool in Voyant Tools produces search terms only in lower case.
With
BookNLP
, which is specifically built for English texts, the authors will now focus on how successfully the sub-processes in its pipeline fare with a non-Western language and how it could be adapted and/or how a similar pipeline could be developed for isiXhosa using tools developed by SADiLaR. BookNLP was developed by Bamman, Underwood and Smith (2014). The study follows similar approaches to those utilised by Algee-Hewitt, Porter and Walser (2016).
Finally,
regular expressions
will be used as well, as it allows to match patterns and search for very specific character sequences more effectively.
Operationalisation of the research questions
The study has two parts. First, by reporting on the performance of the computational tools on the isiXhosa drama corpus versus an English equivalent. The specific steps for Voyant Tools and differences when using regular expressions will be provided and compared. Second, in terms of research questions focusing on the representation of Xhosa protagonists by male and female authors, regular expressions was used as the main investigation tool.
The paper reports on:
How can computational tools used to analyse Western languages be used for conjunctively written South African languages?
Are authors of isiXhosa literature influenced or led by their gender when writing?
Do authors conceptualise their work with the intention to uplift one gender while diminishing the other?
How can the gap caused by inequality between sexes be bridged through written literature?
A practical example:
If data from an English corpus is analysed using Voyant Tools, the tool would automatically be able to provide word frequencies and links between words. However, with a conjunctive language like isiXhosa, this would only be possible by making use of special search options, because the generic frequency table will be skewed owing to the difference in semantic properties of the words. For example, gender association in isiXhosa depends solely on the prefix. Only through the prefix will one be able to confirm whether a noun is referring to a single person or a group of people. Furthermore, only through contextualisation will one know whether that person is male or female, as isiXhosa prefixes are also unisex.
E.g.:
uPeter
u
sela amanzi
Peter (he) is drinking water
uSammy
u
phunga iti.
Sammy (she) is drinking tea.
This research also aims to provide a point of departure for new scholars interested in analysing isiXhosa literary works using computational approaches.
Key words
: Conjunctive language, Nguni language, computational methodologies, voyant tools, regular expressions, BookNLB
Bibliography
Algee-Hewitt, M., Porter, J., Walser, H
. (2016). Representations Of Race: Mining Identity In American Fiction, 1789-1964. In Digital Humanities 2016: Conference Abstracts. Jagiellonian University & Pedagogical University, Kraków, pp. 111-112.
Bamman, D., Underwood, T. and Smith, N. A.
(2014). A Bayesian Mixed Effects Model of Literary Character.
Proceedings of the 52nd Annual Meeting of the Association for Computation Linguistics.
Baltimore, Maryland, pp. 370-79.
Bosch, S., Pretorius, L. and Fleisch, A.
(2008). Experimental Bootstrapping of Morphological Analysers for Nguni Languages.
Nordic Journal of African Studies
17(2):66-88.
Guthrie, M.,
(1969).
Comparative Bantu, Farnborough
: Gregg, vol. 4.
Guthrie, M.,
(1970). Contributions from Comparative Bantu studies to the prehistory of Africa.
Language and history in Africa.
(Dalby ed.), 1:1-27.
Ngqase, F.F.,
(2002).
The way in which women are portrayed in isiXhosa dramas.
B.A Thesis. University of Stellenbosch. Available at: (Accessed: 24 April 2019)
Peter, Z.W.,
(2010).
The depiction of female characters by male writers in selected isiXhosa drama works.
BA Thesis. Nelson Mandela Metropolitan University. Available at:
http://hdl.handle.net/10948/1482
(Accessed: 24 April 2019)
Sinclair, S. and Rockwell, G.
(2019) Languages.-
https://voyant-tools.org/docs/#!/guide/languages Date of access: 24 April 2019.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at Utrecht University
Utrecht, Netherlands
July 9, 2019 - July 12, 2019
436 works by 1162 authors indexed
Conference website: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/index.html
References: http://staticweb.hum.uu.nl/dh2019/dh2019.adho.org/programme/book-of-abstracts/index.html
Series: ADHO (14)
Organizers: ADHO