© 2025 SSR Journal of Multidisciplinary (SSRJM) Published by SSR Publisher 124
SSR Journal of Multidisciplinary (SSRJM)
Volume 2, Issue 3, 2025 Homepage: https://ssrpublisher.com/ssrjm/ ISSN: 3049-1304
Email: office.ssrpublisher@gmail.com
A Hybrid Approach to Contextual Information Extraction in Low-
Resource Igbo
Uzoaru Godson Chetachi
Department of Computer Science, Clifford University Owerrinta, Abia State, Nigeria
Received: 19.05.2025 | Accepted: 24.06.2025 | Published: 07.07.2025
*Corresponding Author: Uzoaru Godson Chetachi
DOI: 10.5281/zenodo.15832045
Abstract Original Research Article
Citation: Uzoaru, G. C. (2025). A hybrid approach to contextual information extraction in low-resource Igbo. SSR Journal of
Multidisciplinary (SSRJM), 2(3), 124-138.
1.0 INTRODUCTION
In recent years, the demand for natural language
processing (NLP) systems that can effectively understand
and generate human language has significantly increased
[
i
], particularly for under-resourced languages[
ii
] like
Igbo[
iii
]. Traditional NLP techniques predominantly rely
on the availability of large datasets for training[
iv
], which
are often scarce or nonexistent in low-data environments
[
v
]. Consequently, there is an urgent need for innovative
methods that can leverage existing resources and enhance
performance[
vi
]. One promising solution to this challenge
is a hybrid approach that integrates various embedding
techniques and transformer models to extract contextual
information effectively [
vii
].
GloVe (Global Vectors for Word Representation) and
FastText are two prominent word embedding models that
offer unique advantages for low-data scenarios[
viii
]. GloVe
captures global statistical information about word co-
occurrences, providing dense vector representations that
encapsulate semantic relationships[
ix
]. This model is
particularly useful for capturing relationships between
words based on their contexts in large corpora[
x
]. On the
other hand, FastText enhances this representation by
considering subword information[
xi
], which is especially
beneficial for morphologically rich languages like Igbo,
where prefixes and suffixes can alter meanings and
functions significantly[
xii
]. This dual embedding strategy
allows for a more nuanced understanding of the language,
facilitating the extraction of contextual information even
when training data is limited.
Furthermore, the integration of Compact Convolutional
Transformers (CCT) enhances the model's ability to
process contextual relationships more efficiently[
xiii
].
CCTs aim to reduce the complexity associated with
traditional transformer architectures[
xiv
] while maintaining
effectiveness in capturing long-range dependencies and
contextual nuances[
xv
-
xvi
]. This compact architecture is
particularly advantageous in resource-constrained
environments, allowing for faster training and inference
times without sacrificing performance [
xvii
]. The
combination of GloVe, FastText, and CCT not only
addresses the data scarcity problem but also enhances the
model's overall robustness in language processing tasks.
Recent advances in the field have shown that hybrid
models can outperform traditional methods in various NLP
tasks[
xviii
]. For instance, [
xix
] demonstrated that combining
multiple embeddings could significantly improve
sentiment analysis accuracy in low-resource settings[
xx
].
Similarly, [
xxi
] found that leveraging subword information
Extracting contextual information from low-resource languages such as Igbo remains a significant challenge due to limited
linguistic data. This paper proposes a novel hybrid approach that leverages both global and subword-level information to
address this limitation. A hybrid embedding framework, combining GloVe and FastText embeddings, is employed to
capture rich semantic and syntactic information. These embeddings are then integrated into a Compact Convolutional
Transformer (CCT) architecture, which replaces the computationally intensive self-attention mechanism with efficient
convolutional layers. This design enables effective capture of local and global dependencies while reducing computational
costs. Experimental results on small, domain-specific Igbo datasets, including customer support and medical dialogues,
demonstrate the superior performance of the proposed model over baseline approaches. The hybrid model achieves higher
accuracy and F1 scores, highlighting its potential to improve NLP performance in low-resource settings. This work
contributes to the advancement of natural language processing for underrepresented languages.
Keywords: Low-resource languages, Natural Language Processing (NLP), Contextual information extraction, Compact
Convolutional Transformer (CCT).