Indonesian Journal of Electrical Engineering and Computer Science Vol. 18, No. 2, May 2020, pp. 1004~1014 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v18.i2.pp1004-1014 1004 Journal homepage: http://ijeecs.iaescore.com Weighted inverse document frequency and vector space model for hadith search engine Septya Egho Pratama 1 , Wahyudin Darmalaksana 2 , Dian Sa’adillah Maylawati 3 , Hamdan Sugilar 4 , Teddy Mantoro 5 , Muhammad Ali Ramdhani 6 1,3,6 Department of Informatics, UIN Sunan Gunung Djati Bandung, Indonesia 2 Department of llmu Hadits, UIN Sunan Gunung Djati Bandung, Indonesia 4 Department of Mathematic Education, UIN Sunan Gunung Djati Bandung, Indonesia 5 Department of Computer Science, Sampoerna University, Indonesia Article Info ABSTRACT Article history: Received Aug 24, 2019 Revised Oct 25, 2019 Accepted Nov 11, 2019 Hadith is the second source of Islamic law after Qur’an which make many types and references of hadith need to be studied. However, there are not many Muslims know about it and many even have difficulties in studying hadiths. This study aims to build a hadith search engine from reliable source by utilizing Information Retrieval techniques. The structured representation of the text that used is Bag of Word (1-term) with the Weighted Inverse Document Frequency (WIDF) method to calculate the frequency of occurrence of each term before being converted in vector form with the Vector Space Model (VSM). Based on the experiment results using 380 texts of hadith, the recall value of WIDF and VSM is 96%, while precision value is just around 35.46%. This is because the structured representation for text that used is bag of words (1-gram) that can not maintain the meaning of text well). Keywords: Classification Convolutional neural network Deep learning Glove Indonesian language process Natural language processing Text mining Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Dian Sa’adillah Maylawati, Department of Informatics, UIN Sunan Gunung Djati Bandung, Jl. A.H. Nasution 105, Bandung, 40614, Indonesia Email: diansm@uinsgd.ac.id 1. INTRODUCTION Hadith are all the words, deeds, decrees and approvals of the Prophet Muhammad which are made provisions or laws in Islam. Hadith is used as a source of law in Islam besides the Qur'an, Ijma’ (the agreement of the scholars in establishing a legal law in religion based on the Qur'an and Hadith in a case that occurred) and Qiyas (establish a law for a new case that does not exist yet), where in this case, the position of the hadith is the second source of law after the Qur'an [1-5]. Studying and practicing the contents of the hadith content in daily life is highly important for Muslims [6]. However, many fake hadiths that appear, it is necessary to have a selective in studying hadith. Many weak and fake hadiths are circulating among Muslims because of the lack of selective nature in hearing the hadith, as a result there are irregularities in social life. It is necessary to study the hadith required by a more expert to explain the hadith and references that have been guaranteed correct. Search engine technology as one of Information Technology implementation is a computer program that designed to search spesific data based on input keywords [7-9]. Most of the search engines that already exist and are widely used today provide the results of data acquisition that has been sorted based on the level of relevance of the keywords we input. Today, search engine technology is more than database query. To increase the level of relevance of data, search engines can not be separated from the Information Retrieval (IR) and Text Mining (TM). IR is related with TM method, either text classification or text clasterizationto find the best result based on input keywords [8, 10, 11]. Even, Google Search Engine [12], Google Scholar [13],