Abstract—In this study, an OCR system for segmentation, feature extraction and recognition of Ottoman Scripts has been developed using handwritten characters. Detection of handwritten characters written by humans is a difficult process. Segmentation and feature extraction stages are based on geometrical feature analysis, followed by the chain code transformation of the main strokes of each character. The output of segmentation is well-defined segments that can be fed into any classification approach. The classes of main strokes are identified through left-right Hidden Markov Model (HMM). Keywords—Chain Code, HMM, Ottoman Script Recognition, OCR I. INTRODUCTION HE Ottoman Empire lasted until the twentieth century. Ottoman Empire comprised an area of about 5.6 million km². After the decline of Ottoman Empire nearly 30 countries appeared in this area. Therefore the written state archives of these countries were Ottoman Script. Also in Turkey the state archives are in Ottoman Script. For these reason it is important to recognize Ottoman Script. Ottoman Script is a variant of the Turkish language which was used as the administrative and literary language of the Ottoman Empire, containing extensive borrowings from Persian, which in turn had been permeated with Arabic borrowings. Spoken Turkish lived and developed alongside Ottoman Turkish and was greatly influenced by its extensive borrowings from Arabic and Persian. Optical Character Recognition (OCR), involves a system designed to translate images of typewritten or handwritten text into machine-editable text. By the development of artificial intelligence techniques many OCR applications developed and satisfactory results obtained. Most of the works done in this topic is about Latin character recognition. It is difficult to work Arabic characters or Ottoman characters because of the script characteristic. There are some works done in Arabic Manuscript received June 30, 2006. A. Onat is with Selcuk University, Bozkır Vocational School of Higher Education, Computer Department, Turkey (phone: +90-332-426 1444, e-mail: aonat@selcuk.edu.tr). F. Yildiz is with Selcuk University, Engineering and Architecture Faculty, Geodesy & Photogrammetry Engineering Department, Turkey (e-mail: fyildiz@selcuk.edu.tr). M. Gündüz is with Selcuk University, Engineering and Architecture Faculty, Computer Engineering Department, Turkey (e-mail: mgunduz@selcuk.edu.tr). character recognition, but for Ottoman scripts the studies is not sufficient. In this study Hidden Markov Models are used for recognition of Ottoman Scripts. The detailed study and the satisfactory results explained. Character recognition problem is transferring a page to the computer that contains symbols and matching these symbols with previously known or recognized symbols. After extraction the features of these symbols via appropriate with preprocessing methods. II. OTTOMAN SCRIPTS The Ottoman alphabet contains 28 letters. Each character has between two and four shapes. This shape depends on the position of the letter within its word or subword. The shapes have the four conditions: beginning of a (sub)word, middle of a (sub)word, end of a (sub)word, and in isolation. Table I shows each shape for each letter. For example, the letter which hasn’t initial or medial shape, can not be connected to the following letter as in the Table I [1]. III. DIGITIZATION AND PREPROCESSING A 300 dpi scanner was used to digitize the image for this investigation. After the colored image was taken, it is converted to gray. Then this gray image is converted to binary image to use for investigation. Finally, skeletonization algorithm was used for thinning. IV. SKELETONIZATION The skeleton of a binary object is a collection of lines and curves that encapsulate the size and shape of the object. There are in fact many different methods of defining a skeleton. In this study Zhang-Suen’s Skeletonization Algorithm is used for thinning. The algorithm steps are shown below; N Flag a foreground pixel p=1 to be deletable if 1. 2 =< B(p) =< 6 2. X(p) = 1, 3. If N is odd, then p2 * p4 * p6=0 p4 * p6 * p8=0 If N is even, then p2 * p4 * p8=0 p2 * p6 * p8=0 Item 1 in the algorithm ensures that the pixels that have only one neighbor or have seen or more are not deleted. If a Ottoman Script Recognition Using Hidden Markov Model Ayşe Onat, Ferruh Yildiz, and Mesut Gündüz T World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:2, No:2, 2008 462 International Scholarly and Scientific Research & Innovation 2(2) 2008 scholar.waset.org/1307-6892/4689 International Science Index, Computer and Information Engineering Vol:2, No:2, 2008 waset.org/Publication/4689