International Journal of Computer Science & Information Technology (IJCSIT) Vol 3, No 4, August 2011 DOI : 10.5121/ijcsit.2011.3419 245        A.P.SivaKumar 1 , Dr.P.Premchand 2 , Dr.A.Govardhan 3 1 Assistant Professor, Department of Computer Science Engineering, JNTUACE, Anantapur 2 Professor, Department of Computer Science Engineering, Osmania University, Hyderabad 3 Principal & Professor, Department of Computer Science Engineering, JNTUHCE, Nachupalli sivakumar.ap@gmail.com,p.premchand@uceou.edu,govardhan_cse@yahoo.co.in Abstract. Retrieving information from different languages may lead to many problems like polysemy and synonymy, which can be resolved by Latent Semantic Indexing (LSI) techniques. This paper uses the Singular Value Decomposition (SVD) of LSI technique to achieve effective indexing for English and Hindi languages. Parallel corpus consisting of both Hindi and English documents is created and is used for training and testing the system. Removing stop words from the documents is performed followed by stemming and normalization in order to reduce the feature space and to get language relations. Then, cosine similarity method is applied on query document and target document. Based on our experimental results it is proved that LSI based CLIR gets over the non-LSI based retrieval which have retrieval successes of 67% and 9% respectively. Keywords: Latent semantic indexing, Cross language information retrieval, Indexing, Singular value decomposition. 1 INTRODUCTION Information Retrieval (IR) deals with representing, storing, organizing, and accessing information. This representation and organization of information is useful for user accessing. The main goal of Information Retrieval (IR) is to retrieve the information which is relevant to the users need. This Information Retrieval will be helpful in structuring of the language. The demand for multilingual information is becoming profound as the users of the internet throughout the world are increasing. This demand creates a problem of retrieving documents in one language by specifying query in other language. This increasing necessity for retrieval of multilingual documents comes up with the new branch called Cross Lingual Information Retrieval (CLIR). Cross Lingual Information retrieval makes use of user queries in one language (source language) and utilizes them in retrieval of documents in other language (target language). For example, if the user enters a query in Hindi language then relevant documents in English will be retrieved. These retrieved documents are semantically equal. Many information retrieval methods depend on the exact match between words in user queries and words in documents. The documents which contain the words in user query are returned to the user. So those methods will fail in retrieving the documents which do not match with the words in the user queries in a proper way. There are many standard methods like, Dictionary based method, Inverted indexing method, Probabilistic based methods are failed due to the consideration of words in user queries. The most familiar dictionary method for CLIR is also not giving efficient information retrieval, due to the limited number of indexing terms or words present in the dictionary method.