(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 1, 2020 327 | Page www.ijacsa.thesai.org The Multi-Class Classification for the First Six Surats of the Holy Quran Nouh Sabri Elmitwally 1 , Ahmed Alsayat 2 Department of Computer Science, College of Computer and Information Sciences, Jouf University, Aljouf, Saudi Arabia 1, 2 Department of Computer Science, Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt 1 Abstract—The Holy Quran is one of the holy books revealed to the prophet Muhammad in the form of separate verses. These verses were written on tree leaves, stones, and bones during his life; as such, they were not arranged or grouped into one book until later. There is no intelligent system that is able to distinguish the verses of Quran chapters automatically. Accordingly, in this study we propose a model that can recognize and categorize Quran verses automatically and conclusion the essential features through Quran chapters classification for the first six Surat of the Holy Quran chapters, based on machine learning techniques. The classification of the Quran verses into chapters using machine learning classifiers is considered an intelligent task. Classification algorithms like Naïve Bayes, SVM, KNN, and decision tree J48 help to classify texts into categories or classes. The target of this research is using machine learning algorithms for the text classification of the Holy Quran verses. As the Quran texts consists of 114 chapters, we are only working with the first six chapters. In this paper, we build a multi-class classification model for the chapter names of the Quranic verses using Support Vector Classifier (SVC) and GaussianNB. The results show the best overall accuracy is 80% for the SVC and 60% for the Gaussian Naïve Bayes. Keywords—Text classification; machine learning; natural language processing; text pre-processing; feature selection; data mining; Holy Quran I. INTRODUCTION Text classification of the Holy Quran is a research topic researchers should pay attention to in the context of machine learning algorithms. The Holy Quran is a book that was sent down from the heavens into the heart of the prophet Muhammad to be delivered to all human beings, not only Muslims. The sacred words were revealed by Allah and written into a meaningful textual format that could be analysed and classified using machine learning classification algorithms. It is considered a comprehensive book covering every component of life and accessible to all people. It addresses the heart and mind as one. The texts of the Holy Quran are fertile ground for natural- language processing and text classification. Their uniqueness and meanings distinguish the features. The Holy Quran is the first source of legislation in Islam. It is necessary to apply data-mining techniques to classify the verses into chapters (surats) intelligently based on machine learning techniques. Furthermore, annotation of the verses of the Holy Quran’s surats depends not only on the text itself but also on the ordering of the surats. Therefore, this study builds a model to classify and differentiate Quranic verses, according to their surats. We have previously studied the architecture of the Arabic Language Sentiment Analysis (ALSA) [1]. We extended the concept of text classification to apply it to the Holy Quran’s verses. The total number of verses in the Holy Quran is about 6000. Multi-class classification means that we need an automating model that enables classification of the texts accordingly. For this reason, this paper looks at the first six chapters from the Holy Quran; its approximately 1000 verses contain a total 8000 features for the training and testing data. This paper is constructed as follows: the next section presents related work on multi-class text classification of the Holy Quran. Experimental method and analysis are covered in Section 3. Finally, the fourth section includes the results followed by the conclusions and anticipations of future work. II. RELATED WORK The study detailed in [2] proposed an automation model that could classify Al-hadeeth features into Sahih, Hasan, Da’if, and Maudu, using machine learning techniques (LinearSVC, SGDClassiﬁer, and LogisticRegression). The author of [3] built a machine-learning model using an algorithm (KNN, SVM, and Naïve Bayes) classification model to annotate labels for the Quranic verses. The accuracy of the text-classification algorithms reached over 70% for the multi-labels of the Quranic verses. The authors of [4] proposed a multi-label classiﬁcation approach to the topics of Quranic verses using a k-Nearest Neighbor (KNN) algorithm with a weighted TF-IDF and TF- IDF. Another research paper looked at the impact evaluation for four classification algorithms (SVM, KNN, Naïve Bayes and Decision Tree) to classify the topic of the Quranic Ayāts/verses [5]. The same concept as studied in [6] used the MultinomialNB classifier. The authors of [7] used the Propbank Corpus to improve the performance of semantic argument classification on Quran data using the SVM Linear. The authors of [8] applied the GBFS approach to label Quranic verses based on two major references, the