Open Access. © 2022 the author(s), published by De Gruyter. This work is licensed under the
Creative Commons Attribution 4.0 International License.
https://doi.org/10.1515/9783110767377-016
Martin Hennelly, Langa Khumalo, Juan Steyn,
and Menno van Zaanen
Training of Digital Language Resources
Skills in South Africa
Abstract: South Africa recognizes eleven official languages, although more lan-
guages are spoken in the country. Most of these languages are considered under-
resourced: there is only a limited set of computational resources available. This
includes linguistic data collections as well as computational linguistic tools.
This scarcity of resources limits the computational linguistic and more applied
(e.g., digital humanities) work on these languages. However, in South Africa
there is currently also a lack of people who know how to use these resources.
The South African Centre for Digital Language Resources (SADiLaR) is a gov-
ernment-funded research infrastructure that aims to tackle both problems. First,
it runs a digitization programme, which develops new digital language resources.
This programme digitizes analogue linguistic data collections, but also develops
new computational linguistic tools. Second, a digital humanities programme
aims to build research capacity in the field of digital humanities. This is done
through training events, among other initiatives, which have recently been clus-
tered in the SADiLaR-run “Escalator project”. Escalator aims to develop a com-
munity of practice in the field of digital humanities. By taking a comprehensive
approach to training events with follow-ups, combined with the development of
a Champions Initiative programme consisting of the training of experts, Escalator
aims to make it easier for researchers to transition into more computational types
of research in the humanities and social sciences.
This chapter will provide a historical overview of the field of natural language
processing and digital humanities in South Africa. In particular, it will focus on
the development of computational linguistic resources and their application.
Additionally, an overview of activities in this area performed by SADiLaR will be
Martin Hennelly, South African Centre for Digital Language Resources, North-West University,
Potchefstroom, South Africa, e-mail: martin.hennelly@nwu.ac.za
Langa Khumalo, South African Centre for Digital Language Resources, North-West University,
Potchefstroom, South Africa, e-mail: langa.khumalo@nwu.ac.za
Juan Steyn, South African Centre for Digital Language Resources, North-West University, Potchef-
stroom, South Africa, e-mail: juan.steyn@nwu.ac.za
Menno van Zaanen, South African Centre for Digital Language Resources, North-West University,
Potchefstroom, South Africa, e-mail: menno.vanzaanen@nwu.ac.za