On the use of Lexeme Features for writer verification Anurag Bhardwaj, Abhishek Singh, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State University of New York Amherst, New York 14228 {ab94,singh5,hs32,srihari}@cedar.buffalo.edu Abstract Document examiners use a variety of features to analyze a given handwritten document for writer verification. The challenge in the automatic classification of a pair of doc- uments to belong to the same or different writer, are both (i)The task of proper selection and extraction of features from the handwritten document and (ii)The use of a proper model that is capable of utilizing the true discriminatory power of these features for classification. This paper de- scribes the use of content specific skeleton based features for characters and pairs of characters(bigrams) and as- certains their discriminatory power. A triangulation skele- tonisation procedure is first used to obtain the skeleton of the character(s), and features are computed from the skele- ton. Experiments and results are conducted on content spe- cific features extracted for two most frequently occurring bigrams(th, he), and characters(d and f). A neural net- work based on a Bayesian formulation was used to ascer- tain the discriminability power of these features. To com- bine these features with previously existing writer verifica- tion features, an alternative Naive Bayes model is also de- scribed and evaluated. From the results obtained, we con- clude that bigram th has the highest discriminatory power followed by character d, f and bigram he. Also the paper highlights the significant increase in performance of writer verification(15% more) with the use of Bayesian neural networks as against the Naive Bayes model. 1. Introduction Writer verification is the problem of determining whether two handwriting samples were written by the same or different writers, which is of vital importance in Ques- tioned Document Examination [1]. The task of proper se- lection and extraction of features from the handwritten doc- ument determines the performance of automatic writer ver- ification. This paper describes the importance of various features for bigrams(th,he) and unigrams d,f. We may de- fine bigrams, unigrams and others as lexemes of handwrit- ing. Features that which can be formally defined and easily extracted from handwritten documents, are termed as com- putational features [2]. The importance of computational features to the overall design of the writer verification is huge, since it enables the automation of writer verification. In this paper, we select and utilize such computational fea- tures for lexemes. The computational features are predomi- nantly unique to the specifc lexeme, and hence they may be termed as content specific features. We analyze the relative discriminatory power of these features using a bayesian for- mulation of a neural network. Additionally, an alternative method(Naive Bayes) that uses a gamma statistical model is also described in order to combine these features to pre- viously existing feature sets[3]. The limiation of the simple Naive Bayes model in terms of writer verification accuracy is also seen when comparing it against the performance us- ing the Bayesian neural network. The rest of the paper is organized as follows. Section 2 explains the selection and extraction of features followed by description of couple of writer verification models that utilize these features in sec- tion 3. Experiments and results are explained in section 4 followed by conclusion in section 5. 2. Features Plenty of features for writer verification have been proposed[4][2]. These include document examiners fea- tures, as well as those that can be easily computed auto- matically. We focus more on features that fall under both categories. 2.1 Feature selection The feature selection is a non-trivial task for all lexemes. We select for our analysis two bigrams(th,he) and two un- igrams(d,f ). The bigrams selected are the most frequently occurring lexemes in natural English writing. The reasons