NEERUGATTI VISHWANATH, ANIL KUMAR GOGI, P PREM KISHAN, SK. KHAMURUDDEEN, S.V.DEVIKA/International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 2,Mar-Apr 2012, pp.1168-1175 1168 | P a g e CLASSIFICATION OF SCRIPTS USING VERTICAL STROKE FEATURE NEERUGATTI VISHWANATH*, ANIL KUMAR GOGI**, P PREM KISHAN***, SK. KHAMURUDDEEN****, S.V.DEVIKA***** *(Assistant professor, Department of ECE, ST. Peters Engineering College, Hyderabad) ** (Assistant professor, Department of ECE, ST. Peters Engineering College, Hyderabad) *** (Assistant professor, Department of ECE, ST. Peters Engineering College, Hyderabad) **** (Assistant Professor, Department of ECE, HITAM, Hyderabad, India) ***** (Associate Professor, Department of ECE, HITAM, Hyderabad, India) ABSTRACT In a multilingual country like India, a document may contain words of text in more than one language. In this environment, multi lingual Optical Character Recognition (OCR) system is needed to read the documents. It is necessary to identify different layout regions of respective language before feeding the document to the OCR system. This Project Work Proposes prioritized requirements of Andhra Pradesh region. Hence, the documents of Andhra Pradesh Government are generally printed in Telugu, and English languages. Certain documents produced in private and Government sectors, like railways, banks, post-offices of Andhra Pradesh are of tri-lingual (a document having text in three languages) type. When it comes to automation, assuming that there are three OCRs for Telugu, and English languages, a pre- processor is necessary by which the language type of the different texts lines are to be identified. In this Work, a script identification technique to identify the text lines of Telugu, and English languages from a bilingual document is presented. In this Work, a simple and efficient technique of language identification for Telugu, and English text lines from a printed document is presented. The proposed system is based on the characteristic features of Stroke and Cursive nature of individual text lines of the input document image. The feature extraction is achieved by finding the behavior of Strokes in individual word boundaries from a printed bilingual document image. Keyword’s—OCR (Optical Character Recognition), Multilingual, Language, Bilingual language I.INTRODUCTION In recent years, the demand for tools to be able to recognize, search and retrieve written and spoken sources of multilingual information has increased tremendously. With the rapid explosion of online repositories, researchers and developers of cross- lingual search and translation systems can get a lot of resources they need easily from the Internet. However, there are still significant resources that can only be accessed in a printed form, especially for sparse, low density languages. Manipulation and conversion of these printed documents is essential for many researchers and organizations. One of the most important tasks to address with printed documents is the automatic recognition of text, which usually consists of three steps: (1) zone segmentation and text region identification using document layout analysis; (2) text line, word and character segmentation; and (3) optical character recognition (OCR). In the last step, OCR systems are often designed to work on documents with the specific script. In order to parse bilingual or multilingual documents such as patents1 or bilingual dictionaries, or perform multilingual document retrieval, the script must be identified before feeding words to an appropriate OCR system. Language identification is an important topic in pattern recognition and image processing based automatic document analysis and recognition. The objective of Language identification is to translate human identifiable documents to machine identifiable codes. The world we live in, is getting increasingly interconnected, electronic libraries have become more pervasive and at the same time increasingly automated including the task of presenting a text in any language as automatically translated text in any other language.