Digital Object Identiﬁer (DOI) 10.1007/s10032-002-0096-2 IJDAR (2002) 5: 233–243 A Hidden Markov Models combination framework for handwriting recognition T. Arti` eres 1 , N. Gauthier 1 , P. Gallinari 1 , B. Dorizzi 2 1 LIP6, Universit´ e Paris 6, 8 rue du Capitaine Scott, 75015, France (e-mail: {Thierry.Artieres,Nadji.Gauthier,Patrick.Gallinari}@lip6.fr) 2 EPH, Institut National des T´ el´ ecommunications, 9 rue Charles Fourier, 91011, Evry, France (e-mail: Bernadette.dorizzi@int-evry.fr) Received: 16 August 2002 / Accepted: 21 November 2002 Published online: 6 June 2003 – c  Springer-Verlag 2003 Abstract. We propose a general framework to com- bine multiple sequence classiﬁers working on diﬀerent sequence representations of a given input. This frame- work, based on Multi-Stream Hidden Markov Models (MS-HMMs), allows the combination of multiple HMMs operating on partially asynchronous information streams. This combination may operate at diﬀerent levels of mod- eling: from the feature level to the post-processing level. This framework is applied to on-line handwriting word recognition by combining temporal and spatial represen- tation of the signal. Diﬀerent combination schemes are compared experimentally on isolated character recogni- tion and word recognition tasks, using the UNIPEN in- ternational database. Keywords: Handwriting recognition, Hidden Markov Models Combination, Multi-Stream HMM, On-line / oﬀ- line combination 1 Introduction In this paper, we investigate the cooperation of Hand- writing Word Recognition (HWR) systems operating on diﬀerent modalities characterizing the same input data. The aim of the study is to introduce a general framework for integrating various representations of the data (i.e. in- formation streams) in a recognition engine. To make the presentation more understandable, we will use our hand- writing recognition application as an illustration through- out the paper. However, the framework may be adapted to other sequence recognition tasks. A lot of work has already been done in classiﬁer com- bination in the pattern recognition community (see e.g. [Kittler 98,Rahman 98]). Combination usually operates at a high level, e.g. by combining the scores of diﬀerent classiﬁers, and usually deals with ﬁxed dimensional data. Compared to these approaches, the classiﬁers we consider operate on sequential representations of the data, that are sequences or signals, and can be combined at diﬀerent intermediate levels of the processing steps. Furthermore, the classiﬁers may operate on diﬀerent modalities. From this point of view, we will distinguish between two main cases for combination, depending on the nature of the information ﬂows: • Combining streams of the same kind like e.g. temporal sequences of feature vectors. This is a classical classiﬁer combination scheme applied to sequence recognition. • Combining streams of diﬀerent nature. For example, the handwriting recognition system we will describe in Sect. 5 and Sect. 6 combines two systems operating respectively on the temporal representation of an in- put word (i.e. the on-line signal) and on a spatial rep- resentation of this input word (a sequence of sliding windows from the left to the right of the image of the word). In the remainder of the paper, for each infor- mation stream, we will refer to its index (time, abscise etc) as the sequence ordering. Hence, in our applica- tion example, we combine information streams that do not share the same sequence ordering, which is time for the on-line signal and the x-coordinate (i.e. abscise) for oﬀ-line signal. Today, many signal recognition tasks such as speech or handwriting recognition are attacked using Hidden Markov Models (HMMs) [Rabiner 90]. These statistical models have become a reference tool for dealing with se- quences or signals. In such systems, a HMM model is built for each unit (e.g. a character in handwriting, a phoneme in speech). These unit HMMs may be concatenated to allow for the recognition of sequence of units (e.g. hand- written or spoken words). The popularity of the HMM technology relies on eﬃcient learning and recognition al- gorithms, and on their ability to simultaneously segment an incoming sequence into units and to recognize these units. In the HWR ﬁeld for example, most handwrit- ing word recognition systems –either on-line and oﬀ-line– dealing with medium or large vocabularies are based on