On the use of Lexeme Features for writer veriﬁcation Anurag Bhardwaj, Abhishek Singh, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State University of New York Amherst, New York 14228 {ab94,singh5,hs32,srihari}@cedar.buffalo.edu Abstract Document examiners use a variety of features to analyze a given handwritten document for writer veriﬁcation. The challenge in the automatic classiﬁcation of a pair of doc- uments to belong to the same or different writer, are both (i)The task of proper selection and extraction of features from the handwritten document and (ii)The use of a proper model that is capable of utilizing the true discriminatory power of these features for classiﬁcation. This paper de- scribes the use of content speciﬁc skeleton based features for characters and pairs of characters(bigrams) and as- certains their discriminatory power. A triangulation skele- tonisation procedure is ﬁrst used to obtain the skeleton of the character(s), and features are computed from the skele- ton. Experiments and results are conducted on content spe- ciﬁc features extracted for two most frequently occurring bigrams(th, he), and characters(d and f). A neural net- work based on a Bayesian formulation was used to ascer- tain the discriminability power of these features. To com- bine these features with previously existing writer veriﬁca- tion features, an alternative Naive Bayes model is also de- scribed and evaluated. From the results obtained, we con- clude that bigram th has the highest discriminatory power followed by character d, f and bigram he. Also the paper highlights the signiﬁcant increase in performance of writer veriﬁcation(∼ 15% more) with the use of Bayesian neural networks as against the Naive Bayes model. 1. Introduction Writer veriﬁcation is the problem of determining whether two handwriting samples were written by the same or different writers, which is of vital importance in Ques- tioned Document Examination [1]. The task of proper se- lection and extraction of features from the handwritten doc- ument determines the performance of automatic writer ver- iﬁcation. This paper describes the importance of various features for bigrams(th,he) and unigrams d,f. We may de- ﬁne bigrams, unigrams and others as lexemes of handwrit- ing. Features that which can be formally deﬁned and easily extracted from handwritten documents, are termed as com- putational features [2]. The importance of computational features to the overall design of the writer veriﬁcation is huge, since it enables the automation of writer veriﬁcation. In this paper, we select and utilize such computational fea- tures for lexemes. The computational features are predomi- nantly unique to the specifc lexeme, and hence they may be termed as content speciﬁc features. We analyze the relative discriminatory power of these features using a bayesian for- mulation of a neural network. Additionally, an alternative method(Naive Bayes) that uses a gamma statistical model is also described in order to combine these features to pre- viously existing feature sets[3]. The limiation of the simple Naive Bayes model in terms of writer veriﬁcation accuracy is also seen when comparing it against the performance us- ing the Bayesian neural network. The rest of the paper is organized as follows. Section 2 explains the selection and extraction of features followed by description of couple of writer veriﬁcation models that utilize these features in sec- tion 3. Experiments and results are explained in section 4 followed by conclusion in section 5. 2. Features Plenty of features for writer veriﬁcation have been proposed[4][2]. These include document examiners fea- tures, as well as those that can be easily computed auto- matically. We focus more on features that fall under both categories. 2.1 Feature selection The feature selection is a non-trivial task for all lexemes. We select for our analysis two bigrams(th,he) and two un- igrams(d,f ). The bigrams selected are the most frequently occurring lexemes in natural English writing. The reasons