Indonesian Journal of Electrical Engineering and Computer Science
Vol. 18, No. 2, May 2020, pp. 1004~1014
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v18.i2.pp1004-1014 1004
Journal homepage: http://ijeecs.iaescore.com
Weighted inverse document frequency and vector space model
for hadith search engine
Septya Egho Pratama
1
, Wahyudin Darmalaksana
2
, Dian Sa’adillah Maylawati
3
,
Hamdan Sugilar
4
, Teddy Mantoro
5
, Muhammad Ali Ramdhani
6
1,3,6
Department of Informatics, UIN Sunan Gunung Djati Bandung, Indonesia
2
Department of llmu Hadits, UIN Sunan Gunung Djati Bandung, Indonesia
4
Department of Mathematic Education, UIN Sunan Gunung Djati Bandung, Indonesia
5
Department of Computer Science, Sampoerna University, Indonesia
Article Info ABSTRACT
Article history:
Received Aug 24, 2019
Revised Oct 25, 2019
Accepted Nov 11, 2019
Hadith is the second source of Islamic law after Qur’an which make many
types and references of hadith need to be studied. However, there are not many
Muslims know about it and many even have difficulties in studying hadiths.
This study aims to build a hadith search engine from reliable source by
utilizing Information Retrieval techniques. The structured representation of the
text that used is Bag of Word (1-term) with the Weighted Inverse Document
Frequency (WIDF) method to calculate the frequency of occurrence of each
term before being converted in vector form with the Vector Space Model
(VSM). Based on the experiment results using 380 texts of hadith, the recall
value of WIDF and VSM is 96%, while precision value is just around 35.46%.
This is because the structured representation for text that used is bag of words
(1-gram) that can not maintain the meaning of text well).
Keywords:
Classification
Convolutional neural network
Deep learning
Glove
Indonesian language process
Natural language processing
Text mining
Copyright © 2020 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Dian Sa’adillah Maylawati,
Department of Informatics,
UIN Sunan Gunung Djati Bandung,
Jl. A.H. Nasution 105, Bandung, 40614, Indonesia
Email: diansm@uinsgd.ac.id
1. INTRODUCTION
Hadith are all the words, deeds, decrees and approvals of the Prophet Muhammad which are made
provisions or laws in Islam. Hadith is used as a source of law in Islam besides the Qur'an, Ijma’ (the agreement
of the scholars in establishing a legal law in religion based on the Qur'an and Hadith in a case that occurred)
and Qiyas (establish a law for a new case that does not exist yet), where in this case, the position of the hadith
is the second source of law after the Qur'an [1-5]. Studying and practicing the contents of the hadith content in
daily life is highly important for Muslims [6]. However, many fake hadiths that appear, it is necessary to have
a selective in studying hadith. Many weak and fake hadiths are circulating among Muslims because of the lack
of selective nature in hearing the hadith, as a result there are irregularities in social life. It is necessary to study
the hadith required by a more expert to explain the hadith and references that have been guaranteed correct.
Search engine technology as one of Information Technology implementation is a computer program
that designed to search spesific data based on input keywords [7-9]. Most of the search engines that already
exist and are widely used today provide the results of data acquisition that has been sorted based on the level
of relevance of the keywords we input. Today, search engine technology is more than database query.
To increase the level of relevance of data, search engines can not be separated from the Information Retrieval
(IR) and Text Mining (TM). IR is related with TM method, either text classification or text clasterizationto find
the best result based on input keywords [8, 10, 11]. Even, Google Search Engine [12], Google Scholar [13],