(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 5, 2023 694 | Page www.ijacsa.thesai.org Automatic Classification of Scanned Electronic University Documents using Deep Neural Networks with Conv2D Layers Aigerim Baimakhanova 1 , Ainur Zhumadillayeva 2 , Sailaugul Avdarsol 3 , Yermakhan Zhabayev 4 , Makhabbat Revshenova 5 , Zhenis Aimeshov 6 , Yerkebulan Uxikbayev 7 Khoja Akhmet Yassawi International Kazakh-Turkish University, Turkistan, Kazakhstan 1, 6, 7 L.N.Gumilyov Eurasian National University, Astana, Kazakhstan 2 Kazakh National Women's Teacher Training University, Almaty, Kazakhstan 3 Abai Kazakh National Pedagogical University, Almaty, Kazakhstan 4, 5 Abstract—This paper proposes a novel approach for scanned document categorization using a deep neural network architecture. The proposed approach leverages the strengths of both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to extract features from the scanned documents and model the dependencies between words in the documents. The pre-processed documents are first fed into a CNN, which learns and extracts features from the images. The extracted features are then passed through an RNN, which models the sequential nature of the text. The RNN produces a probability distribution over the predefined categories, and the document is classified into the category with the highest probability. The proposed approach is evaluated on a dataset of scanned documents, where each document is categorized into one of four predefined categories. The experimental results demonstrate that the proposed approach achieves high accuracy and outperforms existing methods. The proposed approach achieves an overall accuracy of 97.3%, which is significantly higher than the existing methods' accuracy. Additionally, the proposed approach's performance was robust to variations in the quality of the scanned documents and the OCR accuracy. The contributions of this paper are twofold. Firstly, it proposes a novel approach for scanned document categorization using deep neural networks that leverages the strengths of CNNs and RNNs. Secondly; it demonstrates the effectiveness of the proposed approach on a dataset of scanned documents, highlighting its potential applications in various domains, such as information retrieval, data mining, and document management. The proposed approach can help organizations manage and analyze large volumes of data efficiently. Keywords—Deep learning; CNN; RNN; classification; image analysis I. INTRODUCTION In today's digital era, the amount of information and data that businesses and organizations accumulate has increased significantly [1]. This has made it challenging to manage, analyze, and classify large volumes of data, particularly in the form of documents. Document categorization is a crucial task that aims to classify documents into predefined categories to facilitate their management and analysis. Traditional approaches to document categorization or document classification problem have relied on manual classification or rule-based systems, which are time-consuming, labor- intensive, and prone to errors [2]. In contrast, deep learning techniques have shown great promise in automating document categorization problems, offering a more efficient, accurate, and scalable solution of the given problem [3]. Scanned documents are a particular type of document that poses unique challenges for document categorization. Unlike digital documents, scanned documents are typically in image format and require optical character recognition (OCR) before being processed by the model [4]. Optical character recognition software aims to recognize the text in the image and convert it into machine-readable format [5]. However, optical character recognition software may introduce errors or inaccuracies that can negatively impact the performance of the document categorization model. As a result, the use of deep learning techniques can help mitigate these challenges and improve the accuracy of scanned document categorization. In recent years, deep learning techniques have revolutionized the field of document categorization. Convolutional neural networks (CNN) are particularly well- suited for image-based tasks, such as scanned document categorization, as they can automatically learn and extract features from the image [6]. Recurrent neural networks (RNN), on the other hand, are useful for modeling sequential data, such as text, and can learn the context and dependencies between words in a document [7]. The proposed approach in this paper aims to leverage the strengths of CNNs and RNNs to categorize scanned documents accurately. Specifically, the approach involves pre-processing the scanned documents using optical character recognition and converting them into a machine-readable format. The pre- processed documents are then fed into a convolutional neural network, which automatically learns and extracts features from the images. The extracted features are then passed through a recurrent neural network, which learns the dependencies and context between words in the document. Finally, the recurrent neural network produces a probability distribution over the predefined categories, and the document is classified into the category with the highest probability.