Machine Vision and Applications (1993) 6:110-123 Machine Vision and Applications 9 Springer-Verlag 1993 Orthonormal wavelet representations for recognizing complex annotations Andrew Laine, Sergio Schuler, and V. Girish Computer and Information Sciences Department, Computer Science and Engineering Building, Room 301, University of Florida, Gainesville, FL 32611-2024, USA Abstract. This paper describes a novel method of pattern recognition targeted for recognizing complex annotations found in paper documents. Our investigation is motivated by the high reliability required for accomplishing autonomous interpretation of maps and engineering drawings. The recog- nition problem is made difficult in part because characters and text may be expressed in arbitrary fonts and orienta- tions. Our approach includes a novel incremental strategy based on the multiscale representation of wavelet decom- positions. Our approach is motivated by biological mecha- nisms of the human visual system. Choosing wavelets that are simultaneously localized in both space and frequency, and decomposing a signal into a multiscale hierarchical basis with orientation selectivity, can provide a powerful method- ology for pattern analysis. We evaluated several wavelets with different spatial-frequency characteristics and measured their performance in the context of character recognition. Wavelet bases are more attractive than traditional hierarchi- cal bases because they are orthonormal, linear, continuous, and continuously invertible. The multiscale representation of wavelet transforms provides a mathematically coherent ba- sis for multigrid techniques. In contrast to previous adhoc approaches, our method promises a practical solution em- bedded in a unified mathematical theory. A feasibility study is described in which more than 10 000 patterns were recog- nized with an error rate of 2.6% by a neural network trained using multiscale representations from a class of 52 distinct alphanumeric patterns and graphical symbols. We observed a 10-fold reduction in the amount of information needed to represent each character for recognition. These results sug- gest that high reliability is possible at a reduced cost of representation. Key words: Character recognition - Multiscale representa- tions - Wavelet analysis - Engineering documents - Neural network Correspondence to: A. Laine 1 Introduction Fundamental to achieving an autonomous production capa- bility is the development of a reliable method for recognizing the characters and symbols contained within a drawing. This paper describes a novel method of pattern recognition tar- geted for recognizing complex annotations found in paper documents. Our investigation is motivated by the problem of automating the interpretation of maps and engineering drawings. While recent methods of character recognition (Burr 1985; Li et al. 1989; Ohya et al. 1988) have been success- ful in reading printed text from books, extracting annota- tions within the context of engineering drawings and maps requires a more general and robust method. In particular, the problems of orientation (recognizing text placed non- horizontally) and feature extraction (separation of text from graphics) remain unsolved. We present a novel method of character recognition that is capable of providing the high reliability needed to make autonomous systems feasible. Our method includes an in- cremental strategy for recognizing characters based on the multiscale representation of wavelet decompositions (Coif- man and Wickerhauser 1990; Daubechies 1988; Kumar et al. 1990; Mallat 1989). Using wavelets as a set of basis func- tions, we may decompose an image into a multiresolution hierarchy of localized information at different spatial fre- quencies. Wavelet bases are more attractive than traditional hierarchical bases because they are orthonormal, linear, con- tinuous, and continuously invertible. The multiscale repre- sentation of wavelet transforms provides a mathematically coherent basis for multigrid techniques. In contrast to pre- vious adhoc approaches, our method promises a practical solution embedded in a unified mathematical theory. We describe an incremental strategy that utilizes the mathematical continuity (bijection) between hierarchical lev- els of wavelet decompositions, Similar to traditional coarse to fine matching strategies, we attempt first to recognize coarse features within low frequency levels of the wavelet transform. If higher resolution is required to resolve an am-