Digital Object Identifier (DOI) 10.1007/s10032-002-0096-2 IJDAR (2002) 5: 233–243 A Hidden Markov Models combination framework for handwriting recognition T. Arti` eres 1 , N. Gauthier 1 , P. Gallinari 1 , B. Dorizzi 2 1 LIP6, Universit´ e Paris 6, 8 rue du Capitaine Scott, 75015, France (e-mail: {Thierry.Artieres,Nadji.Gauthier,Patrick.Gallinari}@lip6.fr) 2 EPH, Institut National des T´ el´ ecommunications, 9 rue Charles Fourier, 91011, Evry, France (e-mail: Bernadette.dorizzi@int-evry.fr) Received: 16 August 2002 / Accepted: 21 November 2002 Published online: 6 June 2003 – c Springer-Verlag 2003 Abstract. We propose a general framework to com- bine multiple sequence classifiers working on different sequence representations of a given input. This frame- work, based on Multi-Stream Hidden Markov Models (MS-HMMs), allows the combination of multiple HMMs operating on partially asynchronous information streams. This combination may operate at different levels of mod- eling: from the feature level to the post-processing level. This framework is applied to on-line handwriting word recognition by combining temporal and spatial represen- tation of the signal. Different combination schemes are compared experimentally on isolated character recogni- tion and word recognition tasks, using the UNIPEN in- ternational database. Keywords: Handwriting recognition, Hidden Markov Models Combination, Multi-Stream HMM, On-line / off- line combination 1 Introduction In this paper, we investigate the cooperation of Hand- writing Word Recognition (HWR) systems operating on different modalities characterizing the same input data. The aim of the study is to introduce a general framework for integrating various representations of the data (i.e. in- formation streams) in a recognition engine. To make the presentation more understandable, we will use our hand- writing recognition application as an illustration through- out the paper. However, the framework may be adapted to other sequence recognition tasks. A lot of work has already been done in classifier com- bination in the pattern recognition community (see e.g. [Kittler 98,Rahman 98]). Combination usually operates at a high level, e.g. by combining the scores of different classifiers, and usually deals with fixed dimensional data. Compared to these approaches, the classifiers we consider operate on sequential representations of the data, that are sequences or signals, and can be combined at different intermediate levels of the processing steps. Furthermore, the classifiers may operate on different modalities. From this point of view, we will distinguish between two main cases for combination, depending on the nature of the information flows: • Combining streams of the same kind like e.g. temporal sequences of feature vectors. This is a classical classifier combination scheme applied to sequence recognition. • Combining streams of different nature. For example, the handwriting recognition system we will describe in Sect. 5 and Sect. 6 combines two systems operating respectively on the temporal representation of an in- put word (i.e. the on-line signal) and on a spatial rep- resentation of this input word (a sequence of sliding windows from the left to the right of the image of the word). In the remainder of the paper, for each infor- mation stream, we will refer to its index (time, abscise etc) as the sequence ordering. Hence, in our applica- tion example, we combine information streams that do not share the same sequence ordering, which is time for the on-line signal and the x-coordinate (i.e. abscise) for off-line signal. Today, many signal recognition tasks such as speech or handwriting recognition are attacked using Hidden Markov Models (HMMs) [Rabiner 90]. These statistical models have become a reference tool for dealing with se- quences or signals. In such systems, a HMM model is built for each unit (e.g. a character in handwriting, a phoneme in speech). These unit HMMs may be concatenated to allow for the recognition of sequence of units (e.g. hand- written or spoken words). The popularity of the HMM technology relies on efficient learning and recognition al- gorithms, and on their ability to simultaneously segment an incoming sequence into units and to recognize these units. In the HWR field for example, most handwrit- ing word recognition systems –either on-line and off-line– dealing with medium or large vocabularies are based on