Methods of the Arabic Manuscripts Digitization 1 Prof. Oleg Redkin, Dr. Olga Bernikova Department of Asian and African Studies, Laboratory for Analysis and Modeling of the Social Processes, St. Petersburg State University, St. Petersburg, Russia 1 The authors acknowledge Saint-Petersburg State University for a research grant 2.37.175.2014. Abstract The mediaeval Arabic manuscripts are not only valuable artifacts but they also represent one of the major sources of scholar information in the field of Oriental Studies. This paper discusses the methods of Arabic Manuscripts Digitization. Over the last fifteen years a lot of Arabic manuscript digitization projects have been carried out. Digital collections of the manuscripts based on Arabic script are represented in the collections of libraries worldwide, including on-line databases. Nevertheless, these collections are restricted by their functionality: technology of metadata integration relies on the human made characteristics. While a possibility of automatic metadata introduction would facilitate the task of manuscript processing, at the same time it allows automatic quantitative and quality analysis of the manuscripts. This paper analyses different methods for classifying and retrieving historical Arabic handwritten documents and suggests the most efficient methods of their digitization. Keywords: manuscript, digitization, Arabic 1 Introduction Mediaeval Arabic manuscripts are not only valuable artifacts but they represent one of the major sources of scholar information in the field of Oriental Studies as well Although Arabic manuscripts have always remained in the focus of the scholars’ attention, for a long period of time the methods of their description and investigation have been almost unchanged and based not only on researcher’s experience, qualification and knowledge, but on a subjunctive opinion as well. The description of these manuscripts has a long history and, as a rule, includes a collection of data on the history of their origin, content and characteristics of the physical state of the document. Recent decades have witnessed the spread of the digital processing, retrieval, storage and transmission of information which, in its turn, has allowed new methods of data processing in Arabic, and opened new opportunities for scholars. Thus the digitalization of Arabic handwriting heritage has completely revolutionized this process and provides creation of electronic on-line catalogs, the digitization of the scanned images and, to some extent, optical character recognition (OCR). 2 The term “digitization” In the historical perspective “digitizing of a document” meant a creative surrogate, an alternative carrier intended to be preserved [2]. Today there are several different interpretations of the term. Simplified understanding of the first approach is digitizing as making images: computer processing of Arabic manuscripts limited to their scanning and recording received in *.bmp, *.jpg, *.ipeg or other types of files on the media or posting them on sites of other academic institutions. The second approach lies in the field of text recognition, i.e. digitizing that includes scanning and optical character recognition as a minimum. This solution is quite difficult in case of Arabic manuscripts. There is another interpretation of the term “digitizing” which we refer to a historical document. Digitizing a huge amount of manuscripts requires a sophisticated information system that established relations between data (digital images) and metadata. Metadata is a “structured piece of information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource" [3]. As a minimum, metadata should conform to the Shareable Metadata Guidelines for libraries. Digitization enhances access to the artifact as its image can be seen on the web by users all over the world. Besides, it can be sent for offline viewing using a higher resolution of uncompressed master file. 3 Previous experiences Over the last fifteen years a lot of Arabic manuscripts digitization projects were carried out. Digital collection of manuscripts based on Arabic script is represented in the collections of libraries worldwide, including on-line databases [4]. For example, Princeton University Library and the Free University, Berlin, in conjunction with the Imam Zayd ibn Ali Cultural Foundation (IZbACF) in Sanaa, Yemen [5] implemented the collection that is a part of the Princeton Digital Library of Islamic Manuscripts [6].