M. Hanumanthappa et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.11, November- 2014, pg. 54-60 © 2014, IJCSMC All Rights Reserved 54 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 3, Issue. 11, November 2014, pg.54 – 60 RESEARCH ARTICLE A Detailed Study on Indian Languages Text Mining M. Hanumanthappa 1 , M. Narayana Swamy 2 1 Department of Computer Science, Bangalore University, Bangalore Karnataka, India 2 Department of Computer Science, Presidency College, Bangalore, India 1 hanu6572@hotmail.com, 2 narayan1973.mns@gmail.com Abstract— India is a country with huge population of over hundred and twenty seven core, who speaks different languages. Only 5% of Indian population can effectively communicate in English and rest 95% are comfortable with their regional languages. India is certainly one of the multilingual nations in the world today. In the Constitution of India, a provision is made for each of the Indian states to choose their own official language for communicating at the state level for official purpose. To penetrate the benefits of Communication and Information Technology up to common masses, the content is available in Indian language. In India, we are starting to see a growth in consumption of Indian language content, because of growth of electronic devices and technology. As these devices get cheaper, the internet is accessible in the smaller towns and rural parts of the country. So because of growth of internet, the demand for content in Indian language is also been rising. The availability of constantly increasing amount of textual data of various Indian regional languages in electronic form has accelerated. Not much work has been done in Indian languages text processing. The objective of this paper is to understand the following: Growth of data in Indian languages, Need of text mining for Indian languages, Literature survey on Indian language text mining, Application and so on Keywords— TDIL, W3C, LLP, LIP, CLIP, Zipf’s Law I. INTRODUCTION India has more languages than any other country in the world. So India is a multi-linguistic, multi-script country with 23 official languages and 11 written script forms. About a billion people in India use these languages as their first language. English, the most common technical language, in the government, and the court system, but is not widely understood beyond the middle class and those who can afford formal, English-language education. People throughout the world have been using computers and Internet in their own languages. Even though the India is multi- linguistic country, But the Indian users are compelled to use them in English. In India among men 72 per cent do not speak English, 28 per cent speak at