S. Chaudhury et al. (Eds.): PReMI 2009, LNCS 5909, pp. 406–413, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Resolving Ambiguities in Confused Online Tamil
Characters with Post Processing Algorithms
A.G. Ramakrishnan and Suresh Sundaram
Medical Intelligence and Language Engineering Laboratory
Indian Institute of Science, Bangalore, India
{ramkiag,suresh}@ee.iisc.ernet.in
Abstract. This paper addresses the problem of resolving ambiguities in
frequently confused online Tamil character pairs by employing script specific
algorithms as a post classification step. Robust structural cues and temporal in-
formation of the preprocessed character are extensively utilized in the design
of these algorithms. The methods are quite robust in automatically extracting
the discriminative sub-strokes of confused characters for further analysis. Ex-
perimental validation on the IWFHR Database indicates error rates of less than
3 % for the confused characters. Thus, these post processing steps have a good
potential to improve the performance of online Tamil handwritten character
recognition.
Keywords: Confusion Pairs, Sub stroke Extraction and analysis, Fourier De-
scriptors, Online handwritten character recognition.
1 Introduction
Tamil is a popular classical language spoken by a significant population in South East
Asian countries. There are 156 distinct symbols in Tamil [1]. As far as earlier work
on recognition of online Tamil characters is concerned, Deepu et al. [2] generate class
specific subspaces using principal component analysis, while Niranjan et al. [1] have
employed dynamic time warping for matching unequal length feature sequences.
Hidden Markov models for recognition have also been reported in [3] [4]. In a recent
work, we have studied the performance of the 2DPCA Algorithm [5], which was
originally proposed for face recognition.
Each of the above schemes is found to give nearly similar generalization perform-
ances on a given test data. Most of the misclassifications of the given data, in general,
are attributed to the fact that Tamil has many symbols that look visually similar. Any
classifier that works on features at a global level fails to capture finer nuances that
make these symbols distinct. One way to circumvent this drawback would be to in-
corporate a post processing step that employs local features to reduce the degree of
confusion between frequently confused characters, and thereby improves the overall
performance of the recognition. Specifically, this paper proposes algorithms for dis-
ambiguating frequently confused symbols. The approaches are developed, taking into
account, the popular writing / lexemic styles of modern Tamil script. They can be
applied irrespective of the nature of the classifier used for the recognition.