© 2016. Swapna Narala, B. Padmaja Rani & K. Ramakrishna. This is a research/review paper, distributed under the terms of the
Creative Commons Attribution-Noncommercial 3.0 Unported License http://creative commons. org/licenses/by-nc/3.0/), permitting all
non-commercial use, distribution, and reproduction inany medium, provided the original work is properly cited.
Telugu Text Categorization using Language Models
By Swapna Narala, B. Padmaja Rani & K. Ramakrishna
JNTU College of Engineering
Abstract- Document categorization has become an emerging technique in the field of research due
to the abundance of documents available in digital form. In this paper we propose language
dependent and independent models applicable to categorization of Telugu documents. India is a
multilingual country; a provision is made for each of the Indian states to choose their own authorized
language for communicating at the state level for legitimate purpose. The availability of constantly
increasing amount of textual data of various Indian regional languages in electronic form has
accelerated. Hence, the Classification of text documents based on languages is crucial. Telugu is the
third most spoken language in India and one of the fifteen most spoken language n the world. It is
the official language of the states of Telangana and Andhra Pradesh. A variant of k-nearest neighbors
algorithm used for categorization process. The results obtained by the Comparisons of language
dependent and independent models.
Keywords: text categorization, language dependent and independent models, k-nearest neighbors.
GJCST-H Classification: D.2.11,D.2.12
TeluguTextCategorizationusingLanguageModels
Strictly as per the compliance and regulations of:
Online ISSN: 0975-4172 & Print ISSN: 0975-4350
Publisher: Global Journals Inc. (USA)
Type: Double Blind Peer Reviewed International Research Journal
Volume 16 Issue 4 Version 1.0 Year 2016
Information & Technology
Global Journal of Computer Science and Technology: H