ADBU-Journal of Engineering Technology Barman, AJET, ISSN: 2348-7305, Volume 8, Issue 2, December, 2019, 008020748(12PP) 1 Developing Assamese Information Retrieval System Considering NLP Techniques: an attempt for a low resourced language Anup Kumar Barman 1 , Jumi Sarmah 2 , Shikhar Kr Sarma 3 1 Department of Information Technology, Central Institute of Technology, Kokrajhar Kokrajhar - 783370, Assam. INDIA. ak.barman@cit.ac.in 2 Department of Information Technology, Gauhati University Guwahati - 781014, Assam. INDIA. jumis884@gmail.com 3 Department of Information Technology, Gauhati University Guwahati - 781014, Assam. INDIA. sks001@gmail.com Abstract: This paper engulfs the activities involved in developing a Monolingual Information Retrieval (IR) system for an Indo-Aryan language- Assamese. In a multilingual country like India, where 23 official languages exist, the task of digitizing local language contents is growing tremendously. To meet the need of each individual’s relevant information, monolingual Information Retrieval in own language is very essential. The work aims to develop a search engine that retrieves relevant information for the fired query in one's respective language. Various Linguists, Researchers collaborated with the work, provided valuable information and developed various important resources. Many informative resources, language resources, tools & technologies were research, analyze, develop and applied in implementing the overall pipeline. The search engine is frame worked on open search platforms- Solr and Nutch with NLP applications embedded in it. Computational Linguistics or Natural Language Processing (NLP) enhances the performance of the IR system. Each phase of the system is being elaborately described in this paper and explained step-wise. This work is a remarkable contribution to Assamese language technology and an important application of NLP. Keywords: Information Retrieval, Natural Language Processing, Assamese Language (Article history: Received: 30 th November 2019 and accepted 22nd December 2019) I. INTRODUCTION In today’s era, data, information, facts and knowledge are given prime concern than it was in two, three decades ago. Thanks, Internet! It is now possible to access anything, anytime, anywhere for gaining knowledge regarding any concept. Whenever we access any information from the online web repositories, it is the search engine that comes at this point. But, how do the search engines find the relevant information they provide us? This is where the concept of “Information Retrieval” comes. Information Retrieval is the phenomenon of data storage, fetching, presentation and access to those data items. Extracting the user’s expected information from a large text collection based on the query is the goal of an Information Retrieval(IR system). The number of web users is growing at a fast pace nowadays. Any information can be retrieved by web users anytime and at any place in this globe. But in a country like India, only 10% of the population speak English and 90% are not aware of the digitalized information on the Web as information is available in the English language. Language creates a great barrier for many people to access the digital world. There are traditionally two types of Information retrieval system: Monolingual Information Retrieval System (MLIR) which refers to that system that can retrieve the relevant information in the same language as the query fired by the user whereas Cross-Lingual Information Retrieval System (CLIR) is a sub-field of Information Retrieval dealing with retrieving information in the language different from the source (fired query) language. Our IR system facilitates the user to retrieve data in fired query language- Assamese. The technology behind the IR system is based on two