International Journal of Applied Engineering Research ISSN 0973-4562 Volume 19, Number 1 (2024) pp. 99-102
© Research India Publications. https://dx.doi.org/10.37622/IJAER/19.1.2024.99-102
99
A Comparative Study between Dhundhari and Shekhawati Stemming: Case
of Very Closely Related Languages
Varda Pareek
1*
, Nisheeth Joshi
2
1
Research scholar, Banasthali Vidyapith
Assistant Professor, Computer Science and Engineering,
Manipal University Jaipur, Raj-India,
2
Speech and Language Processing Lab, Centre for Artificial Intelligence,
Banasthali Vidyapith, Rajasthan-India
*
Corresponding Author
Abstract
Dhundhari and Shekhawati are part of a Rajasthani language
group. Shekhawati is a neighbour region of a Dhundhar.
Dhundhari is a richer morphological language than
Shekhwati. Both are low-resources languages. In this paper,
morphology of Shekhawati and dhundhari are analysed by
developing rule-based stemmer. For that,103 suffixes, 32
prefixes of Dhundhari and 89 suffixes, 32 prefixes of
Shekhawati are developed. 124 rules of Dhundhari and 99
rules for Shekhawati are created. The inflectional accuracy for
Dhundhari was greater than Shekhawati whereas derivational
accuracy for Shekhawati was greater than Dhundhari.
Keywords: Dhundhari Language, Shekhawati Language,
Stemmer, Inflectional, Derivational, Rule Based.
INTRODUCTION
It is well known that NLP is a very fast-growing area of
Artificial intelligence but most of the research in NLP till now
focused on very few languages whereas 7000 languages exist
in this world. These other languages are known as low
resource languages as these languages are less searched or
studied and less resources are available for these languages.
These languages are less computerized and minimum
privileged and very little taught. A group of Rajasthani
languages are also low-resourced languages. Dhundhari and
Shekhawati are part of the Rajasthani languages. Dhundhari is
spoken in Jaipur, Dausa, Ajmer and, Tonk districts of
Rajasthan whereas Shekhawati is spoken in Jhunjhunu, Sikar
and Churu districts of Rajasthan. Dhundhari is the second
most popular language of Rajasthan whereas Shekhawati is
the third most popular language of Rajasthan. To analyze the
morphology of these two languages, we developed a rule-
based stemmer for both languages. In this paper, comparison
of morphology of both languages by developing stemmer is
shown.. Both languages have many Hindi words. So, mixed
rules for both the languages separately with Hindi are tried to
create. This task has been very much challenging for
development of stemmers of both languages.
Example: word in Dhandhari is ‘लुगाया ’- ‘- (i.e.noun) which
has ‘लुगा’ - – (i.e. Noun) as a root word. Both come into a
similar category. This is an example of inflectional stemming.
‘खायोड़ो’ – (i.e.noun) is an actual word where root word is
‘खा’ – -(i.e. verb). Here ‘खायोड़ो’ is a noun form and ‘खा’ is a
verb form. This is an example of derivational stemming.
Example: word in Shekhawati is ‘करसी’- ‘- (i.e. verb) which
has ‘कर’ - – (i.e. verb) as a root word. Both come into a
similar category. This is an example of inflectional stemming.
As a stemming example - ‘’ (करेड़ी) – (i.e. adjective) is an
actual word where root word is ‘कर’ – - (i.e. verb). Here
adjective form is converted in to form of noun. This is an
example of derivational stemming.
To analyse morphology of both languages, Rule-based
stemmers of Dhundhari and Shekhawati are developed. The
number of rules developed for Dhundhari and Shekhawati,
were 124 and 99 respectively. Number of prefixes were 32 for
both while number of suffixes were 103 and 89 respectively.
Further details of proposed methodology are discussed in
proposed algorithm section.
Literature Review
Jaafar et al. (2017) introduced a program arrangement of
benchmarking light. They also introduced a measurement called,
“Gs-Score (for Global Stemming Score) that joins execution time
with the precision of stemmers”. The precision of those
frameworks was depending on the stemming system’s exactness.
Furthermore it was notified that the “Holy Quran” is an
astoundingly phenomenal text.