M. Hanumanthappa et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.11, November- 2014, pg. 54-60
© 2014, IJCSMC All Rights Reserved 54
Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 3, Issue. 11, November 2014, pg.54 – 60
RESEARCH ARTICLE
A Detailed Study on Indian
Languages Text Mining
M. Hanumanthappa
1
, M. Narayana Swamy
2
1
Department of Computer Science, Bangalore University, Bangalore Karnataka, India
2
Department of Computer Science, Presidency College, Bangalore, India
1
hanu6572@hotmail.com,
2
narayan1973.mns@gmail.com
Abstract— India is a country with huge population of over hundred and twenty seven core, who speaks
different languages. Only 5% of Indian population can effectively communicate in English and rest 95% are
comfortable with their regional languages. India is certainly one of the multilingual nations in the world
today.
In the Constitution of India, a provision is made for each of the Indian states to choose their own official
language for communicating at the state level for official purpose. To penetrate the benefits of
Communication and Information Technology up to common masses, the content is available in Indian
language. In India, we are starting to see a growth in consumption of Indian language content, because of
growth of electronic devices and technology. As these devices get cheaper, the internet is accessible in the
smaller towns and rural parts of the country. So because of growth of internet, the demand for content in
Indian language is also been rising. The availability of constantly increasing amount of textual data of
various Indian regional languages in electronic form has accelerated. Not much work has been done in
Indian languages text processing.
The objective of this paper is to understand the following: Growth of data in Indian languages, Need of text
mining for Indian languages, Literature survey on Indian language text mining, Application and so on
Keywords— TDIL, W3C, LLP, LIP, CLIP, Zipf’s Law
I. INTRODUCTION
India has more languages than any other country in the world. So India is a multi-linguistic,
multi-script country with 23 official languages and 11 written script forms. About a billion
people in India use these languages as their first language. English, the most common
technical language, in the government, and the court system, but is not widely understood
beyond the middle class and those who can afford formal, English-language education.
People throughout the world have been using computers and Internet in their own languages.
Even though the India is multi- linguistic country, But the Indian users are compelled to use
them in English. In India among men 72 per cent do not speak English, 28 per cent speak at