International Journal of Applied Engineering Research ISSN 0973-4562 Volume 19, Number 1 (2024) pp. 99-102 © Research India Publications. https://dx.doi.org/10.37622/IJAER/19.1.2024.99-102 99 A Comparative Study between Dhundhari and Shekhawati Stemming: Case of Very Closely Related Languages Varda Pareek 1* , Nisheeth Joshi 2 1 Research scholar, Banasthali Vidyapith Assistant Professor, Computer Science and Engineering, Manipal University Jaipur, Raj-India, 2 Speech and Language Processing Lab, Centre for Artificial Intelligence, Banasthali Vidyapith, Rajasthan-India * Corresponding Author Abstract Dhundhari and Shekhawati are part of a Rajasthani language group. Shekhawati is a neighbour region of a Dhundhar. Dhundhari is a richer morphological language than Shekhwati. Both are low-resources languages. In this paper, morphology of Shekhawati and dhundhari are analysed by developing rule-based stemmer. For that,103 suffixes, 32 prefixes of Dhundhari and 89 suffixes, 32 prefixes of Shekhawati are developed. 124 rules of Dhundhari and 99 rules for Shekhawati are created. The inflectional accuracy for Dhundhari was greater than Shekhawati whereas derivational accuracy for Shekhawati was greater than Dhundhari. Keywords: Dhundhari Language, Shekhawati Language, Stemmer, Inflectional, Derivational, Rule Based. INTRODUCTION It is well known that NLP is a very fast-growing area of Artificial intelligence but most of the research in NLP till now focused on very few languages whereas 7000 languages exist in this world. These other languages are known as low resource languages as these languages are less searched or studied and less resources are available for these languages. These languages are less computerized and minimum privileged and very little taught. A group of Rajasthani languages are also low-resourced languages. Dhundhari and Shekhawati are part of the Rajasthani languages. Dhundhari is spoken in Jaipur, Dausa, Ajmer and, Tonk districts of Rajasthan whereas Shekhawati is spoken in Jhunjhunu, Sikar and Churu districts of Rajasthan. Dhundhari is the second most popular language of Rajasthan whereas Shekhawati is the third most popular language of Rajasthan. To analyze the morphology of these two languages, we developed a rule- based stemmer for both languages. In this paper, comparison of morphology of both languages by developing stemmer is shown.. Both languages have many Hindi words. So, mixed rules for both the languages separately with Hindi are tried to create. This task has been very much challenging for development of stemmers of both languages. Example: word in Dhandhari is ‘लुगाया - - (i.e.noun) which has ‘लुगा- (i.e. Noun) as a root word. Both come into a similar category. This is an example of inflectional stemming. खायोड़ो(i.e.noun) is an actual word where root word is खा-(i.e. verb). Here ‘खायोड़ो’ is a noun form and ‘खाis a verb form. This is an example of derivational stemming. Example: word in Shekhawati is ‘करसी- - (i.e. verb) which has ‘कर- (i.e. verb) as a root word. Both come into a similar category. This is an example of inflectional stemming. As a stemming example - ‘’ (करेड़ी) (i.e. adjective) is an actual word where root word is ‘कर- (i.e. verb). Here adjective form is converted in to form of noun. This is an example of derivational stemming. To analyse morphology of both languages, Rule-based stemmers of Dhundhari and Shekhawati are developed. The number of rules developed for Dhundhari and Shekhawati, were 124 and 99 respectively. Number of prefixes were 32 for both while number of suffixes were 103 and 89 respectively. Further details of proposed methodology are discussed in proposed algorithm section. Literature Review Jaafar et al. (2017) introduced a program arrangement of benchmarking light. They also introduced a measurement called, “Gs-Score (for Global Stemming Score) that joins execution time with the precision of stemmers”. The precision of those frameworks was depending on the stemming system’s exactness. Furthermore it was notified that the “Holy Quran” is an astoundingly phenomenal text.