Issues in Machine Translation of Indian Languages for Information Retrieval Margi Patel Military College of Telecommunication Engineering Mhow, India margi.patel22@gmail.com Brijendra Kumar Joshi Military College of Telecommunication Engineering Mhow, India brijendrajoshi@yahoo.com Abstract— Natural languages differ from one geographical location to the other. In India, there are 22 official languages [1]. Many documents have been digitized as a result of the advancement of information technology. Machine Translation Systems (MTS), as well as information retrieval systems, are required in order to retrieve any information from an existing digital document in any natural language. This paper describes some of the most important domains in information retrieval where Machine Translation (MT) is essential, such as Cross-Lingual Information Retrieval (CLIR) and Multi-Lingual Information Retrieval (MLIR). Keywords-Machine Translation; CLIR; MLIR; MTS I. INTRODUCTION The automatic translation of text from one language to another is referred to as Machine Translation [3]. Machine Translation System (MTS) is an application of artificial intelligence in Natural Language Processing (NLP). The language of the input text is referred to as the source language, while the language of the output text is referred to as the target language. These days MTS is an arising area of study for scientists in India. India is multilingual country. Indian government utilizes Hindi or English language as a correspondence medium while different states of India utilize their local language as a correspondence medium [4]. There is a major interest for record transformation starting with one language into the other language. The English language is generally utilized in all fields. So MTSs are required for interpretation of local language to English language or vice- a- versa. The act of storing, finding, and retrieving information from a database that fits a user's request is known as Information Retrieval (IR) [5]. Since non English material (Hindi, Gujarati, etc.) is rapidly expanding, the digital world is no longer monolingual. The capacity to obtain information in different languages is becoming important in an increasingly globalized economy. In the digital age, the multiplicity of languages is becoming a barrier to understanding and familiarity. As a result, IR has become a critical field of study in recent years. It has been discovered that when users receive services in their native language, they are more likely to accept and use them. One of the most significant challenges in CLIR and MLIR is identifying relevant material for a query issued in the user's native language. As the World Wide Web expands, so does the amount of material available in languages other than English on the internet. In recent years, there has been a tremendous rise in the availability of non-English content on the internet. All important government institutions, newspapers, and publishing firms created websites in Hindi or Gujarati or any native language [6]. National boundaries are becoming less important in terms of commerce and information sharing as a result of globalization. Hindi is the world's third most commonly spoken language. Gujarati is also the most frequently spoken language in Gujarat. India is diverse in terms of languages, and just 12% of the population is familiar with the English language [7]. IR in languages such as Hindi, Gujarati, English etc. is gaining popularity. Google now supports transliteration in 14 languages namely Arabic, Bengali, Farsi (Persian), Greek, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Punjabi, Tamil, Telugu and Urdu [8]. Society gains the benefit of allowing users to access information in their native language and retrieve information in the same language without knowing which language the information is stored in the database through the MLIR process, making it a very effective research area. The IR system may help people in various areas, including agriculture, rural health, education, national resource planning, crisis management, information kiosks, and others. In terms of IR development a lot of work is being done. Other relevant fields of research are also being pursued, such as NLP, MT, and so on. For IR, scholars have considered a variety of regional languages. Government organizations such as TDIL (Technology Development for Indian Languages) have also made major contributions to the standardization of Indian languages on the web [9]. II. LITERATURE REVIEW Extra information is available online in the form of text, audio, video, and other media. This source will provide users with important information. IR is the act of obtaining essential documents or information from the content of a data https://doi.org/10.5281/zenodo.5504240 International Journal of Computer Science and Information Security (IJCSIS), Vol. 19, No. 8, August 2021 https://doi.org/10.5281/zenodo.5504240 59 https://sites.google.com/site/ijcsis/ ISSN 1947-5500