International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-10 Issue-1, November 2020 326 Published By: Blue Eyes Intelligence Engineering and Sciences Publication Retrieval Number: 100.1/ijitee.K78330991120 DOI: 10.35940/ijitee.K7833.1110120 Cross Language Information Retrieval (CLIR): A Survey of Approaches for Exploring Web Across Languages Suhas D. Pachpande, Parag U. Bhalchandra Abstract: In the era of globalization, internet being accessible and affordable has gained huge popularity and is widely being used almost everywhere by Government, private organizations, companies, banks, etc. as well as by individuals. It has empowered its users to contribute to the creation of information on web enabling them to use their native languages which consequently has drastically increased the volume of web-accessible documents available in languages other than English. This exponential growth of information on the internet has also induced several challenges before the information retrieval systems. Most of the present monolingual information retrieval systems can retrieve documents in the language of query only, missing the information in other languages that may be more relevant to the user. The need of information retrieval systems to become multilingual has given rise to the research in Cross Language Information Retrieval (CLIR) which can cross the language barriers and retrieve more relevant results from documents in different languages. This article is a review of motivation, issues, work and challenges related to various CLIR approaches. Starting with the most fundamental approaches of translation, it is attempted to study and present a review of more advanced approaches for enhancing the retrieval results in CLIR proposed by various researchers working in this domain. Keywords: Cross Language Information Retrieval, Dictionary- Based Translation, Corpus-Based Translation, Machine Translation, lexical ambiguity, bilingual dictionary, term- matching, term frequency, document ranking. I. INTRODUCTION Globalization has brought the world together reducing significance of geographical borders for trade as well as information exchange. Internet technologies being more affordable without time and space constraint and easily accessible have enabled the world population to use web as their social and collaboration platform empowering every web user to not only be a web information consumer but also to contribute to creation of information on web. This exponential growth of information on the internet has induced several challenges before the information retrieval systems. “The goal of an information retrieval system is to locate relevant documents in response to a user’s query. Documents are typically retrieved as a ranked list, where the ranking is based on estimations of relevance”[1]. Revised Manuscript Received on November 20, 2020. * Correspondence Author Suhas D. Pachpande*, Department of Computer Science, Sant Gadge Baba Amravati University, Amravati (MS), India. Email: suhasdp@gmail.com Parag U. Bhalchandra*, School of Computational Sciences, Swami Ramanand Teerth Marathwada University, Nanded (MS), India. Email: srtmun.parag@gmail.com Most of the present information retrieval systems are monolingual wherein language of the query and retrieved documents are same. Internet being easily accessible and affordable has been very popular over the last few years and hence most of the government departments, companies, educational institutions and almost all organizations have started using web as their primary storage and communication medium. This information is obviously being transacted in various different languages and is stored in Web documents using multiple languages. As a result, the volume of web-accessible documents available in languages other than English has grown drastically. There might be cases where more precise information relevant to user’s request is available in a language other than the language of query. The user may also expect the information to be retrieved in a language in which the user is more comfortable. To facilitate information exchange in this scenario, the information retrieval systems need to be multilingual or cross-lingual. Advances in network architecture have strengthened the infrastructure for information exchange across geographic barriers but are still unable to address the challenges for crossing the language barriers.The monolingual information retrieval engines usually fail to present this information to the user which is against the very basic essence of the ubiquitous world wide web, making information available to user. Users have to manually translate queries which is very inefficient considering time required for translation and constraints due to user’s knowledge of unfamiliar languages as well as creating possibilities of retrieving irrelevant information due to incorrect translation.This explosive growth of Internet and diversity of available information sources in several languages has fostered the need for multilingual information retrieval techniques that can cross the language boundaries and has inspired the researchers from Information Retrieval (IR) community to design innovative methodologies for information retrieval across different languages. The Cross Language Information Retrieval abbreviated as CLIR which is a sub domain of Information Retrieval can overcome the language barriers and help retrieving documents in languages that are different from the language of query and offer the most relevant data to the user[2]. Integrating some CLIR tools with traditional search engines may enable them matching terms having same meaning in different languages, presenting this otherwise unexplored data to the user.